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ABSTRACT 

This monograph focuses on educational research on the 
processes and natures of problem- solving activities in mathematics. 
The first chapter presents an overview to both the field and the 
document itself. All of the studies reported reflect interrelated 
investiaations carried out at the University of Madison-Wisconsia r as 
partial fulfillments of Ph*D. requirements in Mathematics or 
Curriculum and instruction. Chapter two describes 31 studies carried 
out between 1969 and 1978 r and divides the research into three 
categories: instruction in heuristics, assessment of proDlem-solving 
performance^ and correlates and factors of problem- solving 
performance. The next four sections are reports of studies on 
teaching problem solvingr chapters seven and eight detail 
investigations on assessing problem-solving performance, and the last 
three portions describe studies on establishing correlates and 
factors of problem-solving performance. (MP) 
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Chapter 1 



Studies on Mathematical Problem 
Solving: An Overview 

Thomas A. Romberg 

'^Problems worthy of aliack, prove iheir worth by hilling back." 
(Hcin, 1966) 

Some mmhciiiaiics sludcnls find joy in attacking worthy problems, and 
some mathematics teachers find joy in instructing their students on how lo 
attack such problems. This monograph on problem solving addresses the fol- 
lowing questions: ''How can we teach problem solving know-how?"; ''Who 
has problem-solving capabilities?;': and "What other intellectual abilities are 
related to that capacity?" 

To introduce this monograph, I have cho.sen an example of a mathe- 
matics problem given to me to solve. 

Given intersecting spheres A and B with B passing through the center of 
A, find a formula for the surface area of B contained in A. (Polya, lec- 
ture notes, 1960) 

I vividly remember when I perceived the solution to this problem. I 
was walking in the Quad at Stanford during the lunch hour after vainly strug- 
gling for at least a day to discover an appropriate relationship which might 
lead to a solution. In an instant. I realized that if the extreme cases of sphere B 
contained in sphere A and still intersecting it were considered, they had the 
same surface area. Although there was much work still to be done to prove my 
insight for a general case. I was convinced I had solved the problem. This 
incident, which occurred nearly 20 years ago, is only one of many I could 
relate which evolved from a series of prohlem-.solving seminars offered by Pro- 
fessor George Polya of Stanford University for mathematics teachers spon- 
sored by the National Science Foundation, 

1 eho.se this example for three reasons. First, while the roots of the 
individual studies reported here are a part of each author's background and 
training, ail of contemporary mathematics education has been significantly in- 
fluenced l>y George Polya and his writings on mathematical problem solving. 
In particular, Malhcwatical Discovery (Polya. 1962) was used as a reference 
book in courses taken or taught by all of the contributors to this volume. The 
above problem, assigned to me by Polya, is illustrative of the types of problems 
he used to teach problem solving. The strategy I used, i.e., looking at extreme 
ca.'^es, is one he advocates. His influence on me was considerable. Although 1 
had a great deal of mathematics training, had worked as an applied mathema- 
tician, had taught high school and college mathematics, and had even solved a 
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lew iatercsling nialhrmalical problems, changed niy oricmaiion wlu'ii I 
look my first coursr Irom him ni 1960. He clarified my ihoughis aboiu malhc- 
maiics and ihi: UMclung ol" malhcmnlics. and improved my prohlem-solving 
know-liow. His books have done llie same for us all. 

Second, while for many educators malhemaiics consists of a large .set of 
voncepts and skills to be mastered, to most maiheniaticians the capability of 
soUing problems that "hit back" is the essence of the discipline. As Polya 
(1962) stated: 

.Solving a proljlem means finding a way out of a difliculty, a way around 
an obstacle, attaining an aim that was not immediately attainable. Solv- 
ing problems is the specific achievement of intelligence, and intelligence 
is the specific gift of mankind: .solving piubleins can be regarded as the 
most charaeieristically human activity, (p. vii) 

While it is true that problem solving is an intellectual activity associ- 
ated with all areas of inquiry, mathematics is one area where problems "wor- 
thy of attack" can readily be po.sed, and from such problems the intellect can 
practice problem solving, "['hus. this monograph is limited to mathematical 
problem solving. 

Third, this prol)lem was assigned to a group of mathematics teachers, 
not mathematicians, psychologists, .sociologists, or curriculum writers, lor 
classroom teachers like myself who have experienced the exhilaration of solv- 
ing a problem; a fascination grows in .spite of the difficulty and frustration one 
often -^ricounters in attempting to solve problems. Teachers become interested 
in how to teach the know-how (the strategies or heuristics) of problem solving 
to their students. Teachers would like students to.-^njoy the exhilaration that 
accompanies successful problem solving. Thus, one worthy educational prob- 
lem is: "How does one leach problem-solving skills?''. Furthermore, any 
teacher who has attempted to teach problem-solving strategies finds only a 
small group of students enjoying and being able to solve problems, while a 
number of studenrs are totally frustrated. Teachers would like to identify those 
students who have an aptitude for solving problems. This involves both di- 
rectly assessing prol)lem-solving performance and identifying correlates of 
such performaiK'es. 

Again, the emphasis reflected in this monograph parallels these two 
cofK erns for teachers: namely, the teaching of problem-solving heuristics and 
the identification of students with problem-.solving aptitude. 

The Chapters in This Monograph 

3t is imj)nriant lo see the nine studies reported in this monograph in 
relation io the extensive body of research literature on [ir{)blem solving. In this 
ijurodui iory cliapler 1 outline my approach jo the study of mathematical prob- 
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Icin solving and bricny discuss each study's location with respect lo that out- 
line. But. til St, h't me briefly summarize the other chapters of the monogra[)h, 

(Ji(i/)U'r 2: rrnhlctn \nlriTii> in matlwmatics, ^^ f}9'197H, In this chap- 
ter, prepared by John Harvey, 31 research studies in problem solving, con- 
ducted between 1969 and 1978, are described. These studies all fall into one of 
three categories: instruction in heuristics, assessment of problem-solving per- 
formance, and correlates and factors of [woblem-solving performance. 

The next four chapters are reports of studies on teaching problem 

solving, 

(.'hfi/ftcr 77ir srrudl ^runif) f/iscotwry method, 1907-1977, In this 
( haptcr Neil Davidson describes an insii uctional technique which he calls 
**!he small grou[) discovery method,'' After describiiig this method he details its 
initial tryout and the subsequent uses which have been made ' i* it. 

dhapU'r 4: /h'l'rlo/mwril of a unit of man her theory for use in high 
school, based on a heuristic a/f/frfxieh. Shlomo Libeskind discusses his develop- 
ment of a number theory unit based on a heuristic approach. This chapter 
presents the data which Libeskind gathered when he tried ou? the number 
theory unit with high school students enrolled in the Michigan Slate Univer- 
sity Inner C^ity Program, 

Chapter 5: An exjyloraiory study on the diagntfstrc teaching of heuristic 
prohlern-soluirig strategies in calculus. This landmark study by John Lueas is 
a pivotal chapter in the monograph. Many of the studies which are subse- 
quently detailed depend upon the Lucas study and his description of the Polya 
problem-solving heuristics, the thinking aloud procedure, and the methodol- 
ogy for summarizing and analyzing the process- product data arising from use 
of that procedure. In addition Lucas*s chapter describes his alicmpts to leach 
the Polya heuristics lo college students in a calculus course. 

Cfiaptcr fi: A niultidimensional exploratory investigation oj small 
,:jroup'heuristic and expository learning in calculus. Norman Loomer, using 
Lucas's refined procedures forgathering and analyzing process-product data, 
evaluates Davidson's small group discovery method for teaching Polya \s prob- 
lem-solving heuristics. 

The ner.t two chapters report studies on assessing problem-solving 
[)erformaiice, 

dhapter 7: A sturly of prohlem-solving performance measures. Donald 
Zaiewski (U-cribes the flevelopment of a paper-and-pencil problem-solving 
instrument lor seventh-grade students, the use of this instrument, and feis at- 
tempts lo correlate the results with data obtained using the thinking aloud 
procedure. 
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Chn/)lry S: Dd'citt/ymt'n/ of a tcsf aj mnthematicni /)nr})lcm snivini; 
which yu'l(h a t om/irvhrnuon, a/)plic(Uion, ami /irnhlem'Solvinif score, Diana 
Wcarnc traces llic (U'velo[)nicnl of an insirumcnl designed lo measure ihe 
problem-solving performance of fourlh-grade students. The chapter presents 
data regarding the validity and reliability of the resulting instrument and de- 
tails the tryout of the instrument. This instrument was also used in the studies 
reported by Meyer and VVhitaker. 

The last three chapters are report.s of studies on establishing correlates 
and factors of problem-solving performance. 

Chapter 9: Mathematical pwhU'm-solving perjormance and intellec- 
tuai abilities of Jmirth-grade children. This chapter, by Ruth Ann Meyer, re- 
ports an investigation of relationships between mathematical problem-solving 
performance and intellectual abilities. She gave 19 tests on intellectual ability 
and one on problem-solving performance, and used factor analytic techniques 
lo isolate six factors related to problem-solving performance. 

Chapter 10: Se,\\ uisiial spatial abilities, ami problem solving, Ann 
Sehonberger reports her investigation of sex differences, spatial ability, and 
problem-.soIving performance. In addition, this chapter reviews the research 
literature concerned with the relationships between spatial ability and sex 
differences. 

Chapter 11: Relatitmships between selected noncognitivc factors and 
the prnblrm-snlvin^ prrjnrmam r of fnurih-grade children. This chapter, by 
Donald Whiiaker, details a study in which he investigated relationships be- 
tween problem-soU'ing performance of children and both ehildren*s and 
teacher's attitudes toward problem solving in mathematics. 

An Approach to the Study of Problem Solving 

In terms of approach, I have chosen to organize ideas about problem 
solving by using a basic stimulus-response framework (see Figure 1 ). One can 
discuss problem solving as task or stimulus specification (the observable char- 
acteristics of a worthy problem), as process ( the distinctive cognitive processes 
used t(j attack a problem), or as product (the distinctive characteristics of the 
responses as a result of attacking a problem). 

In all oi ilie nine .studies some aitention was given to task specification. 
Problems in each siydy are assumed to be mathematical in nature and to re- 
quire the use of mathematical concepts and skills to find a solution. Thus, this 
volcmc is not about the applications of mathematics to other problem situa- 
tions, in particular, no study is about how to develop mathematical modeling 
skills.' 1 re(ni;nizc that niathematir.il modeling is an important ability. It un- 
doubtedly has :\ (lose relationship with problem solving, but that is not the 
emjjhasis of this document. 
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FigurL» 1. Thj? Basic Stimulus-Responsi? Friimework. 

In Schonbcrger's study (Chapter 10), the differentiation between spa- 
tial problems and quantitative problems is central to studying questions about 
how boys and girls approach problems in different ways. And for Wearne's 
study (Chapter 8). a hierarchical differentiation, of questions about mathe- 
matical problems is of paramount importance. 

Similarly, ail of the studies make assumptions about the psychological 
processes used to attack problems. In the psychological literature on problcin 
solving, two principal kinds of prol>lcm solving have been distinguished. The 
''irial-and-error" approach involves a scries of .successive approximations. 
The "insightful" approach involves a discovery of a meaningful means-end 
relationship underlying the problem (Ausubel, 1968). Only insightful prob- 
lem solving is considered here. Insight may involve either a simple transposi- 
tion of a previously learned principle to a new situation, or a cognitive restruc- 
turing and integration of e.vperienco to fit the demands of a designated 
problem. Characteristically, insightful solutions emerge suddenly. However, 
.solutions are not always complete. They often appear after a protracted period 
of inauspicious search .spent in pursuing unpromising leads. 

Insightful problem solving is a type of meaningful discovery learning 
in which problem conditions are nonarbitrarily related to existing cognitive 
slriiclure. Solving such problems involves going beyond the information given 
by transforming information, through analysis, synthesis, rearrangment, re- 
combination, etc The mathematical techniques we call heuristics, assumed to 
be useful in iran.slbrming information are those discussed by Polya (1945, 
1954, 1962). In particular, see Lucas's analysis of Polya's heuristics (Chapter 
.S). Whai should he clear is that although psychological processes associated 
with problem solving are being examined, the studies reported here do not 
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ailcmpi to clarity iKc intcllcflual processes llial one uses when solvini> ;j prol'- 
Icm. In essence, these suulies are not i)asic psychological studies. However, in 
the last three i:hapiers Meyer, Schonberger, and VVliitaker examine the rela- 
tionship of measures of other psychological factors to measures (if problem- 
solving capability. 

To assess the use of heuristics identified by Polya when solving 
problems, the coding procedures originally developed by Kilpairiek (1967) 
were followed. These coding proeedures were for verbal proioeols derived 
from students when instructed to 'Hhin.k aloud" while solving problems, To 
code his data Lucas (( Chapter 5) adapted Kilpatrick's procedures for calculus 
students. Loonier (dhapier 6) and Zalcwski (Clhapier 7) then used variants 
of Lucas's coding in their studies. In particular, Zalcwski used video record- 
ings .so that use of heuristics could be coded from visual as well as oral djita. 

Four papers in this monograph focus on leaching students to use 
heuristics for solving problems. Lucas and Libeskind rely on guided or ar- 
ranged di.scovery. Davidson and Loomer, on the other hand, rely on small- 
group dynamics. In varying degrees all studies demonstrate that students can 
improve at solving problems. However, since all have given their subjects am- 
ple opportunity to solve problems, the long-debated "opporiuniiy-io-learn" 
question in the literature is not clarilied. Briefly, .some psycbologisiu, such as 
Ausubel ( I have argued that because so few students are capable of solv- 
ing problems, it is not a good use of time to try to teach all students problem- 
.solving skilLs. This belief implies that those who are capable will develop those 
skills naturally. On the other side, Polya argues that problem solving, like 
other skills, needs to be practiced. 

Finally, VVhiiaker, while not examining the teaching act itself, is inter- 
ested in the attitudes teachers bring to the teaching of problem solving. 

All u\' th.c studies consider /;7Y;r///r/ or resfionses, Kiaeh study examines 
whether prolilems are solved correctly or nol, and if errors are made, the errors 
are classified. In i)arlicular. Zalew.ski and Wearne use the pattern of re.sfioiiscs 
by individual students on instruments they developed to cluster the students. 
Zalewski's items were the basic .set of items from which Sehonberger .selected 
items* for her study. And, the instrument developed by Wearne was used by 
Meyer and Whitaker in their studies. 

In summary, ibis monograph reports some interesting, interrelated 
studies conducted at the University of Wisconsin-Madison. All of the studies 
were rarricd oui to partially fulfill the Ph.D. requirements in Mathematics or 
Curriculum and instruetion. P^ither Professors Harvey or Romberg ehaired 
each thesi^- eonirniltee. All but one of the studies were partially supported by 
the Wisconsin Research and Development Center for Individualized 
Sciiooling. 
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Chapter 

Problem Solving in Mathematics: 
1969-1978 

John G. Harvey 

In 1969 Kilpairick (1969, 1970) ably and coniprchenslvcly reviewed 
research in problem solving in niathemaiics. I'his chapter will update portions 
of lhai revi'^w for ihe years 1969 to early 1978. 

Before beginning, it seems wise to brieQy discuss the criteria used in 
selecting the studies described since this review will not be as comprehensive as 
Kilpatrick's. The problem-solving studies conducted at the University of Wis- 
consin from 1968 to 1977 and reported in this volume have implicitly or ex- 
plicitly used the following definition of a mathematical problem and of mathe- 
matical problem solving. 

A mathernattcal problem is a situation which poses a question or deGnes 
an objtx'tive in light of some given information or conditions; the individ- 
ual attempting to answer the question or meet the objective does not 
possess an immediate solution; hence die snlutwn process, or ati of solv- 
ing a mathematical problem, requires active .search, prior knowledge of 
mathematics, and a repertoire of heurisiic strategies (Lucas. !972. p. 
10). 

In addition the Wisconsin studies fall into three areas of problem-solving re- 
searih: (a) instruction in heuristics, (b) measurement of problem-solving 
IKTformance. and (c) correlates and factors of pnibleni-solving ability. As a 
result this review only reports studies meeting these two criteria: The 
problems used were mathematical problems and the research is in one of the 
ihree areas named. 

The second criterion is also used lo organize the majority of - Mis chap- 
ter. The next section will describe studies reporting attempts to tead. ncuristic 
strategies and the results of those attempts. The measurement of problem- 
solving performance will be the su[)iect of section two. The third section details 
research which sought correlates and factf>rs of problem-solving ability. 

Instruction in Heuristics 

Single Trca: nent Studies 

Two chapters in this volume, those by Libcskind and Davidson, de- 
scribr the initial tryouts of new instructional systems designed to teach prob- 
lem-solving hellristic^. Appropriately, neither Libeskind nor Davidson .n- 
leiiipted ie> compare tiieir new instructional system to ''conventionar* or to 
other innovative iiiMniciionai system.s; instead they focused iheir attention on 
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ihc (oniponrnl's ol ihcir rcspa live syslnns lo (ieicnninc if Ihcy fundioncd as 
planned. The icsulling study can lie lermed a ^'single ircaimeiu study"; this 
part reports lour otlirr sinf^lc treatment studies attempting to teach problem- 
solving heuristics. 

In his study Gallo ( 1075) examined the role of two problem-solving 
processes. One of them, which he termed Integration, is the capacity to inte- 
grate other problem-.solving processes into the sequence of operations required 
for problem solution; the second, termed Evaluation, is the capacity to judge 
whether an attempted soluiiofi is conr( t. These two puicesses and three other, 
unspeciGcd problem-solving processes were taught to sixth-grade subjects. 
The treatment used was structured so ihat each of the processes could be 
taught in thecomexl of computing the area of a triangle. This prevented inad- 
vertently teaching the interrelation between the processes, and permitted the 
inclusion or omission of either or both of the processes of Integration and Eval- 
uation. CJallo*s results showed that when his subjects bad learned both 
processes the soiutio'ii rate was nearly perfect and when either was absent the 
solution rate did noi excrcd chance. The number of subjects, the length and 
duration of the treaimem, the kind of problei?;-solving instruments employed, 
and the way the pr(»hlem-soIvini'; instruments were used were not described in 
the abstract of this study. 

An exploratory study by Dalton (1975) attempted to determine 
whether there were patterns in ihe thinking processes used by students of aver- 
age or below average ability in mathematics. Next, it described the existing 
patterns, and determined the elfects of '\gujding questions'' upon the thinking 
processes used and upon finding correct solutions. l*he subjects were 44 ninih- 
graoe general mathematics students; they were assigned to an experimental 
and a control group of 22 students each. In both the experimental and the 
control group the students were asked to think aloud while solving three v/ord 
problems; these individual thinking aloud interviews were tape recorded. In 
the experimental group the students were asked ''guiding questions'' during 
the interview. In his abstract Dalton did not report the length of the thinking 
aloud interviews, the number of '\guiding questions" asked of subjects in the 
cx|KTimcntal group, or the way the data were analyzed. He did report that the 
tape re( (U'dings were transcribed and coded, that the ei^rors were analyzed, and 
that his g«-ner;d f»l»servaiions of the subjects were used. He concluded that 
ihrrc were paiienis in the thinking processes (»f his subjects; two modes of 
ilunki-ig. deduction and inal-and-error. were used; and subjects who used 
iii.d-arid-ernM- lended lo be nmre efTeciive ;;.-oblem solvers. Dalton further 
Mates that "the ellecis of .isking students \guiding questions' were not deter- 
mined (oncUisixely.*' 

The study by Kantowski (1977) is similar to Davidson's in that it 
spanned 8 months of a srliDol year and to those of Libeskind, Lucas, and 
l.oorner in that the treatment embodied the heuristics identified by Polya 
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(IV57. 1962, V}()T>), Thr sul)if(ls in lliis loiir-phasc, cliuiail iiivrstlt^aiuin 
wore oigfii lii^h-abiliiy ninih-nradr algebra studcius (lour (cnialcs, four 
males). Thr lirsi |)hasc was an ciglu pn)l)leni preics*. The second phase was 
readiness insiruciion (three lessons per week for 4 weeks) intended lo ac- 
quainl ihe subjeeis with hcurislir insiruciion and lo iniroduce ihem lo using 
hc'urisiirs in problem solving. This phase amduded wiih a icsi. The ihird 
phase, 4 monlhs in duration, was heurisiic insiruciion in geomeiry, 'i'hcrc 
were ihree unilsof geomeiry conteni. Kacli utiil consisied of six iniiial insirue- 
lioual cj)isodes. a midunii tesi, six more itjstruciional episodes, and an end-of- 
unil icst, Tlie fourili phase was a iwo-pari positesi. One part consisted oC 
geometry and verbal problems; the other part, of prerequisite knowledge 
needed to solve the geometry and verbal problems. The number of items in the 
phase-two test, the phase-three tests, and the two parts of the posttest were not 
given. All of the tests were individually administered. During each test the 
subjects were encouraged to think aloud as they solved problems. Each prob- 
lem-solving interview was tape recorded, the subjects* protocols were analyzed 
from these tapes using a modiiicati(m of the coding scheme a^v-loped by Kil- 
patrick (I^H»H}. and a process-product scoro was assigned lo each pn^blcrn 
solution.' 

Using ihc procrss-pKKluct score Kaiucnvski ( 1977) calculated a me- 
dian decimal score (or cadi ol the eight subjects. Then, for each subject, she 
determined the percentages of prol>lems in which the problem-solving 
processes were used. Percentages were calculated for problems with scores 
atove and with scores below the median. Based ufxm these data Kantowski 
reported the following: (a) 59 to 9.S% of the i>roblem solutions with s(()rcs 
above the median showed evidence of the use of goal-orientcd heuristics, while 
at most 52% of the problem solutions with scores below the median showed 
indication of their use; (b) the tendency to use goal-oriented heuristics in- 
creased as problem-.solving ability developed; (c) the percentage of problem 
solutions indicating the use of goal-oriented heuristi( S ranged between 14 and 
72% with a median of 36% on the pretest, and between 14 and 100% with a 
median of 72% on the pisttest"; (d) successful problem solvers manifested 
regular patterns in using the processes of analysis and synthesis, and there is 
an interrelationship between these regular patterns and using goal-oriented 
heuristics; and (e) the subjects seldom used the heuristic of looking back. 

In his siudy Vos (1978) chose the following three key organizers: 
drawing <liagram. approximating and verifying, and construc ting a chart. 
I Ic liyph(»thesiml that these organizers would potentially increase success in 



' l*hc Kilp.it rick coding scheme and thr w.tv in whi( h pnMrss.f>ro(iu( t s(nresarc .issinnnl arc more 
Itilly ilf'ii rilxrd in (thapirr S. 

^The range does not seem tcj In- a good rcprrscntati<in of < hani^c from preicM lo iv)sur«il in this. ' ase. 
If the p<isucst score of one subject is dclrtcd. ihcn the ranKr is from 72 lo 100% wiih a median (if 
72% (.V = 7). 
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jjioljlrni solving, i-'or carii <.rganizcr he. (ievclopcd an insiruciionni tiraiiiK'nl 
of six piTseniaiions. Using a prctcsi-posucsi design Vos laught ihc three ireai- 
ments lo 2 1 randomly selected subjects from grades six, seven, and eight (seven 
at each grade ievd) over a 14-weck period. The pretest instruments consisted 
of a test of mathematics abiiily. a learning style inventory (a modification of 
Learning Style Inventory-A |koU). Rubin. & Mdntyre. 1974| ). and a prob- 
lem-solving test. The postiest instromcnis were a practical judgnv.'nt test (de- 
veloped from late Stanier, 1964). a Pr(»blcm Solving Decision Test (Vos, 
1976). and a problem-solving test. The problem-solving pretest and posltest 
were individually administered and consisted of three and six items, respec- 
tively. Each subject was instructed to think aloud during the tape-recorded 
interviews. Using the process coding scheme developed by Kantowski ( 1977), 
a process-product score was assigned to each solution for the problems in the 
problem-solving tests. Based on the pretest and posttest data Vos concluded the 
following: (a) each of his instructional treatments was successful, (b) his sub- 
jects did use the three key organizers in problem-solving situations, (c) there 
was a relationship between effective application of the key organizers and suc- 
cess in problem solving, and (d) for eighth-grade subjects, there was a strong 
relationship between problem-solving success and practical judgment. 

Treatment Comparison Studies 

At present tb" more conventional educational research paradigm is to 
compare the effects of one treatment to the effects of another. The studies re- 
ported here are of ihal kind. However, the studies are further subdivided into 
those in which a heuristic treatment was compared U) a conventional one. and 
those in which more than {>ne heuristic treatment was used. 

Uvunsttc rs, nmn ritmna/ inMrncttnn , Legg^'tte (1974) attempted to 
determine if iiisiruction in heuristic proces.ses would increase the problem- 
solving pcrforman(e of capable, but poorly prepared college freshmen. Four 
intact classes, U)taling 70 college freshmen, were assigned to an experimental 
group and a control group. Two instructors were randomly assigned t(. one of 
The experimental and one of the control das.ses. B()th the control and experi- 
mental treatments lasted 9 weeks; during the treatment period the control 
classes 'Tollowed normally scheduled class |)rocedures." During the first week 
of the treatment period each experimental class received 3 hours of instruction 
on problem-solving ijrocesses; for the rest of the treatment period those classes 
were taught mathematics using a problem-solving approach. The Basic Col- 
lege Mathematics Problem-solving Test anci ih' Aiken Revised Maihemati< s 
.Mtiiude Scale (Aiken, 1963) were administered to both groups as pre- and 
|K>sttests. There were no significant differences (/> = .01) in problem-solving 
performance between the experimental and the control group. Analysis of va- 
riance was used to dcicrrruie if there were significant dillcrenies between the 
I)roblem-solving mean gain s((>res and the ijttitudc mean gain scores of the two 
treatment groups, li w;»s (t>ncluded that: (a) ihc experimental treatment in- 
( reased the pr(»blem-solving ability of capable, but poorly prepared college 
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freshmen more than ihe control ireaimeni; (b) ilie ex[)crinieniai treauueni 
^'should cause siudenis lo develop a beuer altilude loward maiheniaiics": (c) 
these freshmen could l)C taught the structure of problem solving without affeci- 
ing the amouni of mathematics content taught; and (d) an undergraduate 
mathematics course with a unit on problem solving could be introduced. 

Post and Brennan ( 1970) compared a general heuristic treatment with 
normal inSiruciion in tenth-grade geometry, in the spring of 1972, 94 tenth- 
grade students were pretested using an investigator-developed problem-solv- 
ing instrument. The subjects' scores were rank ordered, and pairs of persons 
with adjacent or coincident pretest scores were formed. One student from each 
of ilie resulting pairs was randomly assigned to the experimental group and 
the other to the control group. .\ median split divided both groups into higii 
and low cells. The experimental classes w given teacher-directed large- 
group instruction which emphasized solving problems using the experi- 
menters* General Heuristic Problem-solving Procedure. The control group 
continued normal instru( tion in geometry. Post and Brennan did not specify 
length of treatment. The same instrument was administered for both the 
pretest and the posiiesi. Two-way analysis of variance was used to compare 
the posiiest means of the experimental and control groups. There were no 
significant treatment effects or internet ion.s. There was a significant difl'erence 
(/} <: .01) due to ability level. 

Lee ( 1977) sought to improve the heuristic problem-solving behaviors 
of fourth-grade students in his exploratory study. 'Using teachers' recommen- 
dations and sliidenis' performance on two Piageiian problems (Equilibrium 
in the Balance and Oscillation of a Pendulum), 16 subjects were selected for 
this cx[>erimeni: eight average achievers who nun Pi a gel's criteria of II -.A cog- 
nitive level on i)(>ih problems and eight high achievers who met Piaget's crite- 
ria ol ll-ii cognitive level. Two groups of equal size, an experimeni;ll and a 
cj)nirol group, were formed by random assignment oi subjects within a stra- 
tum. The cx[>erimemal group was instructed on the use of heuristics when 
solving word pntblems. Although the treatment given to the control group was 
not specified in the abstract, it seems reasonable to asume that they continued 
lo receive their usual instruction in fourth-grade mathematics. The experi- 
mental treatment lasted for 8 weeks; during that time there were 20 insiruc- 
lionai sessions of 45 minutes each. Pre- and postiesis were given to both treat- 
ment groups. Thv pretest ccmsisied of two problems; the posiiesi. six problehis. 
Four weeks after the end of the treatment period the experimental group was 
given two additional problems lo .solve. Tape recordings were made during the 
individually administered testing sessions. In addition subjects* worksheets 
and the investigator's remarks were collected. On the posttesi subjects in the 
experimental group solved 35 of the 48 problems (73% ); control group sub- 
jects solved 3 of ihe-48 problems (6% ). Subjects in the experimental group 
sjilvrd 80% of the problems presented to them during the 4-week follow-up 
testing. The investigator reported the following: (a) there was no change in 
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ihe use of heuristics by control group subjects while there was a noticeable 
increase in their usage by experimental group subjects, (b) subjects in the 
experimental group ''were able to select an appropriate heuristic for nearly all 
the post-experimental interview problems," and (c) there was a dilFerence in 
the use of heuristics between those subjects classifird as meeting Piaget^s crite- 
ria of II-A cognitive level and those meeting II-B. 

Ledbetter (1978) attempted to isolate an aptitude-treatment interac- 
tion in her study of heuristic problem solving. A total of 84 college freshmen 
were randomly divided into an experimental and a control group. During the 
10-week treatments experimental subjects received instruction on problem 
solving and the use of heuristic strategies, while the '-ontrol subjects were in- 
structed in college algebra and trigonometry. All subjects took Ovc ability 
pretest.s. A Solomon four-group design provided data on problem-solving per- 
formance and problem-sorting schemes (Silver, 1978) for approximately half 
of the experimental and control group subjects. There were posttest measures 
of problem-solving performance, algebra and trigonometry performance, and 
problem-sorting schemes; the posttest instruments were administered to all 
subjects. The nine-item problem-solving test included problems solved by 
three heuristic strategies (algebraic symbolism, contradiction, and pattern 
generation) and incorporated three contextual cues (triangle, number, and 
word problems). The problem-sorting schemes data were gathered using a 
problem-similarity questionnaire that required subjects to rate each of nine 
pairs of problems on a continuous similarity scale. Experimental subjects out- 
performed control subjects on the problem-solving posuest (/> < .01 )» ^vbilc 
the contrary was true on the algebra-trigonometry posttest (p < .001). A 
complete-link clustering analysis of the problem-sorting scheme data indicated 
that few differences in dominant clustering schemes could be observed. A heu- 
ristic sorting score was significantly correlated with problem-solving perfor- 
mance (/) < .04); the correlation coefficient was not stated. A hierarchical 
clustering analysis of the ability test data isolated four homogeneous ability 
profile groups. Analysis-of variance showed that the heuristic sorting score was 
related to ability profile group (/) < .01 ) and to treatment group 
(/) < .001). Subjects in the experimental group received higher sorting 
scores. To lest for aptitude-treatment interactions, the problem-solving post- 
test was divided into three subtests corresponding to the three heuristic strate- 
trjcs taught to the experimental group. Results showed that only one of the 
ability profile groups performed significantly better (thep-level was unspeci- 
fied) across all three .subtests following treatment. 

Like- the four studies just described, the next two studies attempted to 
ddcrmiiic ihr rlfrc ts ni teaching problem-solving heuristics to their subjects. 
I fowcvrr. the two remaining studies also attempted to determine the effects of 
heuristic insiriH tion on students of the subjects. 
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The following ihrce oiiiroincs were iinrsiigaied hy Lipson ( \ ^)72): ilic 
suI)jC(is' problem-solving performance, ihc problem-solving performance of 
children laughi by ihe suhjccis, and ihc siibjecis' leaching behavior. The sub- 
jects for ibis sludy. 43 senior maihemaiics majors enrolled in a .secondary 
.school maihemaiics nieihods course, were divided inlo ihree cells: siudenis 
who had pariicipaied in an experinienl as fr<L*shmen and had receive' insi rue- 
lion on heurisiies, siudenis who had pariicipaied as freshmen and had nol 
received heurisiies insiruciion, and siudenis who had nol pariicipaied as fresh- 
men. The majoriiy of ihe subjects were in ihe firsi cell. Half of ihe 43 subjecis 
were assigned lo ihe experimenial ireatmcni, a seminar on heurisiies, and ihe 
oiher half lo ihe eonirol ireaimeni. coniinued pariicipaiion in ihe regular 
meihods class. The absiraci did noi describe the way subjecis were assigned lo 
treaimenis or cells. Ii did explain ihal "the subjecis were partitioned inlo six 
subsamples on the basis of trcauueni and freshmen experience." Problem- 
solving pre- and posttests were administered to all subjects; there was an inter- 
vening treatment period whose length was not specified. While the subjects 
were student teachers, they administered problem-solving pre- and posttests to 
their students. During the same period, trained observers recorded the heuris- 
tic activities of the subjects as studer* teachers. Analysis of variance of the 
pre test- post test gain scores demonstrated that there were no significant difi'er- 
ences between the six .subsam()les. When the 43 subjects were divided into 
gr()U[>s who scored low, medium, or high on the pretest, a two-way analysis of 
variance yielded a significant difi*erence (/) < .01 ) favoring the subjects in the 
experimental treatment. A one-way analysis of variance was u.sed to compare 
the pre- and posttcst means of classes taught by student teachers; there were no 
significant difi*erenees (p <: .01 ) between the classes of the student teachers 
from the six subsamples. .Schefi*c's method of multiple comparisons located 
several significant contra.sis (/> <: .01 ). These contrasts were not described in 
the al)stract, but it was concluded that, on the average, classes taught by stu- 
dent teachers who participated in the experimental treatment had gained more 
in problem -solving performance. There were too few instances of observed 
heuristic teaching to permit statistical analysis. It was stated that the subjecis 
who participated in the experimental treatment and who had had instruction 
in heuristics as freshmen showed more instances of heuristic teaching. A 
higher pretest score was related to greater heuristic behavior as a student 
teacher, 

A similar study has been conducted by Tubb (1975). In his study 
mailieniaiics graduate teaching assistants were trained in heuristic question- 
ing strategies. Flanders* interaction Analysis, or both. Problem-solving per- 
forinanre of the graduate siudents and ihcir calculu.s students wa.s measured. 
.Several |)ositivc resultsare stated, i lowcvcr, the ()roblcnis used in this study do 
not satisfy the definition of a mathematical prol)lem, and thus, the study does 
not nicel the criteria established for this review. Therefore, the study will not 
be further described. 
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CompariMm of heuristic heatmrnts. The studies in this section coni- 
pare one or more hcurisiic ircaimenis. They are disiinguished from ihe previ- 
ously described studies because the treainicnis in this group are usually beticr 
specified. • 

Pennington (1970) examined ihc lollowing iwo approaches lo ihe 
tcacliing of heuristic strategics: the Behavioral Strategy Treatment and the 
Concep'tuai Strategy Treatment. Strategies in the behavioral ireatineni were 
specific, logical steps for problem solution, while conceptual strategies were 
organizing principles. In addition two types of problem-solving practice were 
t-onsidered, Selection and Reception. At the Selection level subjects arranged 
their own learning sequence during trainii:?; at the Reception level there was 
a predetermined learning sequence. Four instructional treatmems were de- . 
rived by pairing each heuristic approach with each level of practice. The eon- 
tent of each treatment consisted of slruciure problems developed ^^according lo 
a system specified by mathematical group theory." The subjects for this study 
were sixth-grade students who, prior to participating in the experiment, were 
taught modular arithmetic addition. In addition to the four instructional treat- 
ment groups, suh>-cts were also assigned to -i control group. The number of 
subjects in each group, the way subjects were assigned lo groups, and ilie 
length of the instructional period were nol given in ihe abstract. Problem- 
solving performance was measured by three acquisition tests administered 
during training, and by learning and transfer tests administered afterward. 
The content of the tests was not specified. The difference between the treat- 
ment and control groups reached the (unspecified) predicted level of signifi- 
( ancc on two of the ihrce acquisition measures, and on all of the learning and 
transfer measures. There were no other significant dilferenees. 

Foster ( 1073) hypothesized that a student who .successfully program- 
med a computer to solve a .series of mathematical problems would develop his 
or her problem-solving ability. For this posttest-only experiment, Foster de- 
fined four treatments l)y specifying the kind of supplementary aids used m 
each. They were no aids (Treatment 1). flow charts only (Treatment 2), 
computer only (Treatment 3), and computer and flow charts (Treatment 4). 
The subjects for this cxperimeni were three intact eighth-grade classes of .24 
students each. After dividing each class into two equal strata using readmg 
ability, stratified random sampling was used to a.ssign subjects to one of the 
four Vz-wcek treatmenis. The posttest was a 48-item, experimenter-con- 
structed test of nine prol)lem-solving behaviors. A two-way analysis ol vari- 
ance of mean performanc c on the posttest was used to determine the effects due 
to trcainicnt. class, and reading lescl. Significant /-values results for reading 
within treatment (/; = .01 ) and treatment X class (/> - -O.S). Pair-wise com- 
parisons, using .Selu-ffe's /-statistic, failed lo show if these significant /--values 
were within one irc'aiment or a reading cell within a treatment. An analysis 
using Donncli's />siatisiie revealed that the mean performance of those using 
tlic computer only was sii^nifieantly greater (/; = .05) than those using nehher 
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ihc ajrufHiit-r nor flow charis. The mean performance ol the lour ireauneni 
(T) groups had ihe direelional order: Tl < '1*2 < T4 < T3. 

In a smdy similar lo Penningum's (1970). Smith (1973) compared 
ihe efiects of giving general versus specific heurisiic advice. The general heu- 
ristic taught was the planning heuristic successfully used by General Problem 
Solver (Ernst & Newell, 1969); the specific heuristic taught applied best to 
the task being studied. The investigator developed three programmed study 
booklets on finite geometry, Boolean algebra, and symbolic logic, three tests 
covering the booklet material, and two transfer tests. The subjects. 176 college 
students who had taken 2 years of high .school niathematic', were assigned to 
two treatment groups, the general heuristic treatment and the specific heuristic 
treatment. Within those treatments nine additional factors were identified 
(three orders of booklet presentation X three orders of booklet test administra- 
tion). Each treatment lasted for 3 weeks; the five investigator-designed tests 
were administered during the fourth week of the study. Information concern- 
ing the subjects' problem-solving methods was gathered by means of inter- 
views and questionnaires after the testing period. The interview procedures 
and, the content of the questionnaires were not described in the study abstract. 
The data were analyzed using a three-way analysis of variance. Subjects in the 
specific heuristic treatment group solved significantly more 
(/; < .001 ) logic problems and completed the Boolean algebra and logic tests 
significantly faster (/; <: .0.S and /> < .01. respectively) than did subjects in 
the general heuristic treatment group. 'I'hcre were no significant diti'erences 
between the treatment groups on the number of transfer problems solved and 
the time required to solve 'them. There were no main effects for order and no 
interactions. The questionnaire and interview results showed that one- to two- 
thirds of the subjects used the heuristics taught to them when completing a 
given learning test and that very few used the heuristics taught to them on the 
transfer tests. 

Training in heuristics was approached diH'erently by Goldberg 
( 1975). She studied the effects of training in heuristics on the ability to write 
proofs in number theory. Goldberg developed two .sets of programmed materi- 
als; one set provided heuristic instruction and the other did not. Three treat- 
ments were designed: Treatment XT u.sed the heuristic programmed materi- 
als and classroom instruction reinforced those heuristics; Treatment X used 
the heuristics programmed materials and classroom instruction did not pro- 
vide reinforcement; Treatment C used the nonheuristic programmed materials 
and classroom instruction did not teach heuristics. Nine intact ela.s.ses were 
randomly a.ssigncd to these treatments; each class met for 75 minutes, twice 
weekly for 6 weeks. During seven of these class meetings subjects worked on 
the programmed materials; during the remaining five classes appropriate 
classroom instruction was provided. At the end of the treatment period the 
following four posttests were administered: a 25-item test of basic concepts 
studied (Concepts I), a test requiring the construction of proofs (Proofs I), a 
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qucsiionnairc designed lo deierminc attitude toward the programmed materi- 
als, and the Childhood Attitude Inventory lor Problem Solving (Covington. 
1966), rive weeks later two tests, Conecpts 11 and Proofs II, were adminis- 
tered. Parts of these two tests were [)arailel to Coneepts I and Proofs I; the 
remaining parts tested mastery of material studied subsequent to the treatment 
period. There were no signifieant dilTerenees between treatment groups in suh- 
jtTts* understanding of the basie coneepts or their ability to construct proofs. 
Students responded that the nonheuristirs programmed materials were more 
helpful, easier, and generally more appealing than the heuristic materials. 
The results of the attitude inventory showed that students given Treatment C 
had a more positive altitude toward the nature of the problem-solving process 
than subjects who received Treatments X or XT. Subjects given Trentment 
XT had more positive attitude:; thun subjects who received Treatment X, 

Pereira-Mendoza ( 197r,;», i970b) taught students toapplyat Itastone 
of two heuristics, examim-.ion of eas^!s and analogy, to mathematical 
problems. Me also investigated the ditferences between learning these heuris- 
tics alone or learning ihem in concert with mathematical content. He specified 
ihrrc levels (if treatment (heuristics only III] , heuristics and content [HC| . 
(cntcnt only |C:| ) ;»iid three instructional vehicles (alg;:braic, geometric, 
maihcniaiically rxatral). Nine .self-instructional bookl'.ns were (lesigned 
which corres()(^ndcd io each of the treatment by vehicle combinations. The 
subjects (.V = 294) were tenth-grade boys in an all-mnle Canadian high 
.school: they were randomly assigned to one of the nine groups. At the end of 
the 10-day instructional period two transfer tests, one algebraic and one geo- 
metric, were administered to all subjects. After eliminating tests on which 
judges could not reach .scoring agreement and then equalizing the group sizes 
by random elimination of test scores, data from 189 subjects (21 per group) 
were analyzed using analysis of variance. On the algebraic test the H treat- 
ment groups, and on the geometric test the HC and H treatment groups scored 
significantly higher than did the C treatment groups. The probability level 
was not specified. There were no significant dilTerenees between the HC and C 
treatment groups on the algebraic test or between the H and HC treatment 
groups on t he geometric test. An analysis of the pattern of heuristic application 
revealed that both heuristics were employed on the algebraic test and there 
was little evidence of the use of analogy on the geometric test. There were no 
significant dilTerenees bciwcen ihc instructional vehicles. 

Vos ( 197(0 c(»niparcd three instructional strategies for promoting the 
use ol live pr<ibleni-solving heuristics. The heuristics were (a) drawing a dia- 
gr.im. (b) af>pn)xim.iiing and verifying, (c) eonstruc:ing an nlgcbraic equa- 
linn. (d) classifying dai;j. and (e) constructing a chart. In the reception trcnt- 
incni the subjcf I was given only the problem task. In the list treatment the 
subject was givcrj the problem task and. after some time had lapsed, was re- 
quired (() read a checklist of desirable problem-solving behaviors. Next, the 
subject wns instructed in specific problem-solving behaviors that eould help 
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solvf iUv i^ivfii pruhlcni Uisk hvhir rrlurnini; to il. in llu' bdwivior irralrnrru 
sul)j(Hts uTic first instruticil in s[)CTilic [)r()l)lrni-s()lving hch.Mviors which 
would help solve ihc suhscqucnily given problem. The 33 subjeels were stu- 
dents in six ninth-, tenth-, and elevenlh-grade mathematics classes at a private 
high school. The mathematics classes and the numbers of subjects in those 
classes were Alge[)ra II (-25). Geometry (29). Malh Survey (21). Ele- 
mentary Algebra (8). and Algebra 1 (50), Using a one-Factor randomized 
complete block design with mathematics class as the blocking variable, subjects 
within classes were rardomly assigned to one of the three experimental treat- 
ments. For each experimental treatmcnl. instruction consisted of investigator- 
cieveloped, self-instructional materials supplemented by a teacher. Over a 15- 
wGck period 20 p.i/bicm tasks of 20 minutes each were given. Pretest data 
consisted of scores from the Sequential Tests of Educational Progress (STEP) 
(Cooperaiive Te.st Division, 1972) forms 2A and 3A. Mathematics Part II. 
Posttest instruments were STEP, {orms 2A and 3A, Mathematics Part I and 
an investigator-constructed Problem Solving Approach Test (PS AT) and 
Problem .Solving ^IVst ( PST). At the .05-level there were no significant dilTer- 
enrcs between treatments except for subjects enrolled in Math Survey on Part 
1 of PSAT. a direct measure of the live problem-solving heuristics taught. 

Ciifierl high school students were the subjects for the study conducted 
by Hall ( 1976). He designed and validated a checklist of heuristics involved in 
formulating [>rol)lems from situations, ano used this checklist to rate perfor- 
mance of gifted students on situational problems before and after instruction in 
situational problem solving. In addition to the situational heuristic checklist a 
planning heuristics checklist, compiled. from Polya\s list ( 1957). was also de- 
veloped. A total of 156 superior secondary school subjects, comprising 39 four- 
pcrs(m teams, was randomly assigned to one of three treatment groups: situa- 
tional heuristics, planning heuristics, and control. The length of the treat- 
ments, the nature of the posttest administered, and the data analysis [)roee- 
dures were not described in the study abstract. The results were that on 
situational problems the subjects in the situational heuristics treatment group 
gave significantly more heuristics than the control group (p < .001 ) and the 
planning heuristics treatment group (/> < .001 ). On these same problems the 
planning heuristics treatment group gave significantly more (p < ,05) 
heuristics than the control group. On ^'well-defined problems" the planning 
heuristics treatment group gave signilicantly more {/) < .05) heuristics than 
the control group. 

Mcdlintock ( 1978) compared three treatments in his study: Giwhieh 
t.uii^ht c;drulator iisai^c. Algebra I content, and problem solving: G2 which 
taught prolilcni solving and Algebra I; and G3 which taught calculator usage 
:iud Algebra 1. The subjects were average ability Algebra i students from a 
private girls' school. The majority were from middle to upper middle class, 
Anglo or Cuban families. The subjects had been randomly assigned to one of 
three classes; there were 10, 17, and 9 students in treatment groups G], G2, 
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and G-^. irspa iively. The Trciumenis lasted from mid-Marcli uniil die lirsi 
\,c(:k oi Jii/u?. PiTiesis and posticsis were administered lo gather data on alge- 
bra n( liicvcrnrnt. inductive reasoning, and deductive reasoning. The tests used 
were (he Lankton First-Year Algebra Test (Lankton, 1965), the Necessary 
Arithmetic Operations Test and Nonsense .Syllogisms Test from the ETS Kit 
of Reference Tests for Cognitive Factors (French, Ekstrom, & Price, I^^69a. 
1969b) , and an investigator-developed number sequence test. These data were 
analyzed through analysis of covariancc. with the pretest data being u.sed as 
the covariate. In addition pre- and posttest problem-solving data were col- 
lected from eight subjects from each of the treatment groups. These subjects 
were in the upper half of their treatment groups on the other pretest measures. 
Six pretest and eight posttest problems were given to each subject; these 
problems were solved while the subjects thought aloud. The pretest and post- 
test interview sessions were tape recorded and the taped protocols were ana- 
lyzed using a process coding scheme similar to the one used by Kantowski 
(1977). The analysis of covariancc revealed significant differences between 
treatment groups on the Lankton First-Year Algebra Test (/; < .05) and the 
Nonsense Syllogisms Test (/; < .01). There were no significant difl'erences 
between treatment groups on Necessary .Arithmetic Operations or the number 
sequence test. The adjusted mean performance of the three treatment groups 
had the directional order G| > G3 > (^2 ii"<I ^2 ^ ^'1 ^'^'3 ^^^^ 
Lankton First-Year Algebra Test and the Nonsense Syllogisms Test, respec- 
tively. Analysis of the protocols indicated that all of the subjects found the pre- 
and posttest problems were difficult to .solve, there was a relationship between 
heuristic processes and productive inferences, in approximately 83% of the 
problem-solving sessions subjects employed systematic trial-and-error, and 
there was a marked increase in the use of algebraic equations between pretest 
and pijsttest. 

Assessment of Problem-solving Performance 

Instruction on problem-.solving heuristics and assessment of problem- 
solving performance are equally important to research on problem .solving. 
The studies described in this section assess and describe problem-solving per- 
lormance. The section will be divided into two parts; those using thinking 
aloud to assess performance and those using techniques other than thinking 
aloud. 

Thinking Aloud Assessment 

Fuller ( 1972) sought to determine if students use diirereni methods of 
solving mathematics problems when under and not under time constraints. 
Sixty-four subjects of average nnd above average intelligence were individu- 
ally administered two problem-solving tests. On one test subjects had 3 min- 
utes to solve each problem, and on the other test they were told they could have 
as much lime as ihey needed. Subjects thought aloud during the tape-recorded 
[)rol)lrm-solving intcrview.s. The recorded protocols were analyzed by the in- 
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vesiigaior; in ihc coding system slie used, problem -solving processes were 
grouped into four ( laegories: rcadini^ the problem, rereading ihe problem, dc- 
duciion, and irial-and-error. 'l*o determine if a subject changed problem-solv- 
ing methods between the two tests a pattern of problem solving was computed 
for that subject on each test. The two patterns were compared by a contingency 
table and the X ^-statistic. No signitieani differences were found; no trends in 
the changes between the two patterns coi'ld be idcntilied. 

Sehwiegx^r (1974) identified n ineoretical model for analyzing mathe- 
matical problem solving which consisted of eight basic abilities. Face validity 
for the model was obtained by generating operational definitions of each abil- 
ity and by gathering ihe comments and opinions of mathematics educators and 
mathematicians after giving them a list of the abilities together with their defi- 
nitions, descriptions, and examples. Finally, a collection of mathematical 
problems from the areas of arithmetic, algcljra, and geometry were used in 
problem-solving interview sessions with secondary school, undergraduate, and 
graduate students. The total number of students interviewed, the number of 
students at eacli l(*vel, the number of problems given to each student, and the 
length of the problem-solving interview were not described in the abstract. 
During the problem-solving interviews students were asked to think .".loud, 
'i'he resulting [»n>to(v»ls wore ia[)e recorded and analyze(t 'Fhe analysis indi- 
(•at<;d that the basic id>ilities of ihc model were ^'necessary and sufficient for 
explaining the observed problem solving process.'* 

In his study Webb ( 1975a, l97Sb) exf)Iored the use of problem-solv- 
ing proe(*sses by high school students. The subjects were forty serond-year 
algebra .students (20 males, 20 females). 'I'hey were asked to think aloud 
while solving eight problems from the exf)eriinenter-dcvelopcd Problem Solv- 
ing Invent{)ry (PSI) of mathematical problems including the areas of geome- 
try, algebra, i\nd analytic geometry ( Kulm, 1977). Sixteen pretest measures of 
cognitive an{l alfective variables were administered. These ^ riables included 
mathematics achievement, attitudes toward mathematics, sf)aiial ability, ver- 
bal ability, reasoning ability, and problem-solving ability. A (oding scheme 
adapted from Kilpatrick (1968) was used to record the tape-recorded proto- 
cols from the thinking aloud interviews; this schcmf* yielded a total score on the 
PSI and the frequency with which each of the problem-solving processes was 
used. Principal component analyses were [performed separately on the pretest 
and process scores. A regression using the component scores as the indepen- 
dent varial)lcs and the total PSI score as the dependent variable showed that 
ihc M:iihcmaliciil /\cliicvcment component accounted for 50% of the variance 
in ilic trXal scores. 1 Icurisiic strategy components, a subset of the process com- 
[)oiicnis, a( counted for an additional 1 5 % ol the variance. Of the 10 heuristic 
sir:iicgics icsied, eight were used to solve one or two problems. No sex differ- 
ences were (ound. Overall it was concluded that better problem solvers use a 
wider range of straiej^ies and techniques than do poorer problem solvers. 



In a study similar 10 VVebl/s (1975a. 1 975b), Oinimcsiad (1^77) ex- 
plored ilic pnK'esses used by a)nimuniiy college sludeius. Subjceis (.V = 60) 
were randomly sclceied from mathemniirs siudenis altcnding two eonmiunity 
colleges in Colorado. During eneb 1 Vi hour interview measures of IQ, mathe- 
maties achievement, eoneeptual tempo, and mathematical problem solving 
were administered. Subjects thought aloud while solving the eight problems on 
the investigator-developed mathematical problem-solving inventory (Kulm, 
1977). The interviews were tape recorded and process coded. Gimmestad re- 
ported that the most popular processes with the subjects were deduction, trial- 
and-error, and equations. Significant correlations {p = .05) were found be- 
tween total problem-solving score and use of the processes of exploratory ma- 
nipulations (r = -.34), successive approximation (r = .37). and deduction 

^ 30). Conceptual tempo, age, sex, and IQ were not significantly related 
to mathematical problem solving pern)rmance, but a significant correlation 

_ Q5) ^vas found l)etwcen mathematics achievement and mathematical 
problem-solving performance. 

Blake (1977) attempted to determine the effects of problem context 
and the degree of Geld independence upon processes used in solving mathemat- 
ical problems. Subjects were 40 eleventh-grade Algebra II students randomly 
seli'ctcd from siudcnts in 14 clas.ses; they were of average ability for students 
enrolled in their program (IQ range: 115-125). Subjects were matched using 
their scores on Witkin's Embedded Figures Test (VVitkin, 1950). One subject 
in each pair was randomly assigned to one of the testing groups. One testing 
group was given five mathematical problems in a real world setting; ihe other 
group was given the same problems in a mathematical settmg. Subjects were 
instructed to think aloud as they solved these problems during individual iape- 
rccorded interviews. The protocols were coded using a system based on a 
mathematical problem-solving model by MacPherson. Blake found that prob- 
lem context is unrelated to heuristics and the degree {>f field independence had 
a marked effect up{)n the use of heuristics and the number {)f correct solutions. 
Field independent subjects demonstrated use of a greater variety of heuristics 
(r = .33), more willingness to change their mode of attack {r = .27), and a 
greater number of correct solutions (r = .30). Both the total number and the 
number of different heuristics used acc{)unied for a significant amount 
[f) < .01 ) of the variance in the number of correct S{)lutions. In particular, 
the use of heuristics accounted for an additional 21 % ol the variance not ac- 
counted for by core procedures (alg{>rithms, diagramming, equations, and 
guessing). Changing mode of attack was significantly related {p < .Qj ) lo 
obtaining a corn-ct sobnicui. 

The following seven cognitive- processes were studied by Hollowell 
( M>77): (a) understand the probletn. (b) recall from memory, (c) formulate 
a liypodiesis or general idea for problem solution, (d) attempt to find a provi- 
sional solution or develo[) a method of solution, (c) check against solution 
model or general Ibrm of answer, (f) verify provisional solution correct, and 



(g) reject provisional decisions. Subjects were 30 high school juniors who 
thought aloud while solving three mathematical problems. One of these 
problems did not requ-re specific algebraic or gcmetric knowledge, one was 
an algebra word problem, and one was a geometry proof. The investigator 
found ihat the problervsolving sequences for the three problems, while simi- 
lar, had some important differences. The recall process (b) appeared more 
frequently in the process coding sequence for the geometry problem than in 
the sequences for the other two problems. In the sequences for the algebra 
word problem a rejection (g) tended to be followed by a new^ attempt at provi- 
sional solution (d). For the other two problems a rejection tended to be fol- 
lowed by the formulation of a new hypothesis (c). Total number and kind of 
processes used did no: appear to be related to success or failure. 

Oniz-Franco (1978) hypothesized thai: (a) the relationship between 
mathematical problem solving and reading ability, mathematics achievement, 
and reasoning was different for Chicano students and Anglo students; (b) 
reading, mathematics achievement, and performance on his problem-solving 
inventory (P.SI) are significantly related to field dependence; and (c) the use 
of trial-and-crror processes differentiates better than field dependence between 
problem solvers. The subjens were 40 Chicano students who had not taken an 
algebra class. Half of the subjects (9 malts, 1 1 females; mean age 14.93 years) 
were dominantly Spanish speaking, and Iialf (10 males, 10 females; mean age 
14. .'^8 years) were dominantly English spca'.:ng. Pretests of mathematical 
achievement, reading achievement, rea.soning, field dependence, divergent 
thinking, and anagrams were given to each subject; these tests were in the 
subject's dominant language. The problem-solving inventory was adminis- 
tered in individual interviews where the subjects were instructed to think 
aloud. The investigator rc[)oricd a significant correlation (f? < .01 ) between 
pmlilem-solving performance and mathematical achievement; the correlation 
coefficient was not reported in the abstract. I'here were no other significant 
differences. 

Problem-solving Assessment Not Dependent on Thinking Aloud 

Many of the studies in the heuristic instruction section and all of the 
above studies have depended on the thinking aloud procedure lo assess prob- 
lem-solving performance. This procedure is easily the most popular one at the 
present time, hicjwever, there are some di.sad vantages and some serious, unan- 
swered criticisms of the procedure. The most serious disadvantage is the 
amount of liiiie it takes to interview each subiect individually and to have an 
examiner prescnl at iliosc interviews A second disadvantage is the difficulty of 
(raining persons i(» reliably code the resulting audio- or videotapes of subjects' 
behavior. .\s several studies in this volume demonstrate, persons can be trained 
lo reliably code ihe process behaviors. Therefore, reliable coding is a disadvan- 
tage ami noi a criticism of the procedure. 
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Crilicisms include ihc following: (n) sul)jccis rnay not report all of 
ihcir thoughts, i>ut only those which arc *'safc" or "acceptable"'; (b) the prob- 
lem-solving processes used during thinking aloud inicr/iews may be different 
from those the same subject would use to solve the s.imc problem while not 
verbalizing; (c) the equipment required to record a thinking aloud interview 
may distract the subjects; (d) it is very difliruli, if not impossible, to employ the 
procedure successfully when the subjects arc young; and (e) at present, the 
resulting process code data must be radically altered to analyze data from 
thinking aloud interviews (see Flaherty, 1973; Hallgren, 1976). Thus, in the 
years from 1969 to 1978, four studies, including those by Zalewski and 
Wearne reported in Chapters 7 and 8, have attempted to find other ways to 
assess problem-solving performance. 

One important aspect of mathematical problem solving, the construc- 
tion of valid proofs, was investigated by Lester (1973, 1974). A group of 19 
public school children was randomly chosen from grades 1-3 (Group A] ) , 4-6 
(Group A9), 7-9 (Group A3), and 10-12 (Group A4). The problem tasks 
involved nKithcmatical proofs in a .simple mathematical system. Computer- 
a.ssisiea in.striiction was used in presenting the ta.sks to control order of presen- 
tation and to record several aspects of subject's behavior (e.g., responses, re- 
spon.se times, errors, and number of trials). Criterion variables used to com- 
pare groups were number of tasks solved, number of tasks attempted, number 
of incorrect applications of rules of inference, trials in excess of the minimum 
required for solution, trial difficulty, presolution time, and total time per task. 
Time variables and nontime variables were analyzed separately u.sing mul- 
tivariate one-way analysis of variance; both tests yielded significant differences 
(/; < .001 ). There were significant univariate differences for number of in- 
ferences (lAR), trial difficulty (TD), and total time (TT). Using Tukey's 
method of multiple comparisons, the following significant results were found: 
Ai < A4 and Ai < A3 for TP (/? < .01); A] > A4 (/; < .01), 
Ai > A3 {p < .OI),and A[ > A2 (/^ < -05) for lAR; A] > A4 and A] 

> A3 for TD (p < .01); and \\ > A4, Ai > A3, A2 > A4 and A2 

> .•\3 for TT {/; < .01). 

Maxwell ( 1 075 ) used a block problem in Jier study of problem-solving 
performance. Three items hypothesized to be convergent in type and three 
items hypothesized to be divergent in type were administered to 105 students 
enrolled in high school geometry. On the basis of the resulting pairs of scores, 
these students were divided into four groups: high on both, high convergent- 
low (livergcni. high divergent-low convergent, and (ow on both. Subjects 
(,V = 49) were chosen Irom each of the four groups. Eac h subject was ob- 
served individually while solving the Ten Block Problem (see Schwanz, 
1973) which required arranging colored blocks in a four by four array. Two 
problem-solving trials were given lo each .subject. During the first trial the 
investigator recorded a stdiject's use of problem-solving processes. Next, the 
subject wn»ie a protocol describing the methods used during that trial to solve 
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ihc prohlrm. Then ihc subject solved ihc problem a seeond limc. The liines of 
boih irials were rec(»r(lcd. From these data Maxwell reported the following 
generalizations: (a) subjecis who scored high on the divergent-in-type items 
made fewer generalizations in their written protocols, used irial-and-error so- 
lution methods more frequendy, and took more time on the second trial of the 
Ten Block Problem than subjects who scored low on the divergent-in-type 
items; (b) trial-and-error played a major role in the problem-solving task ini- 
tially and a minor role, subsequently, as the solution was approached; (c) 
trial-and-error increased the time needed to work the problem and seemed to 
be one of the main characterisiies of an ineffective problem solver; and (d) 
girls made fewer generalizations in their written protocols, used trial-and- 
crror more frequently, and, on the average, required mon^ time to solve the 
problem during the second trial than the boys. 

Correlates and Factors of Problem-solving 
Performance 

The studies in the previous section were concerned primarily with as- 
sessmcnis of problem-solving performane<?. An equally interesting topic is the 
search for those cognitive, affective, personality, school, and demographic vari- 
ables which are related to problem-solving performance. Three studies eon- 
ducted at the University of Wisconsin-Madison investigated the relation of 
variables in these ( lasses to problem-solving performance. They are de.seribed 
in Chapters Nine, Ten, and Eleven of this volume. In addition, four other 
studies conducted between \%^) and 1978 will be described in this section. 

Dodson (1971, 1972) attempted to describe problem-solving perfor- 
mance in terms of (a) mathematics achievement, (b) cognitive and personal- 
ity traits, (c) teacher characteristics, and (d) sehcul and community eharac- 
terislies. The tests (Wilson, Cahen. & Bcgle, 1968b) from the Z-population 
of the National Longitudinal Study of Mathematical Abilities (NLSMA) 
(Romberg & Wilson, 1969) were used in this study. From the items on the 
mathematics achievement tests administered to the NLSMA Z-populaiion at 
the end of eleventh grade, Dodson chose 40 items for his measure of mathe- 
matical problem-solving performance. Using the scores on this measure, the 
portion of the NLSMA Z-population who took mathematics in eleventh grade 
was stratiGed into six ability groups. Subjecis (A* = 1,123) for this study were 
a stratified random sample of 10% of the students in the Z-popuJ.iti(^.n for 
which complete test data were available. Analysis of variance and discriminant 
analysis were used to order the variables from best to poorest as di.scriminators 
uf problem-solving perlbrmanee. All of the mathematics achievement variables 
were significant discriminators {f? < .001) among the six ability groups. 
Four variables (Zlll, ZI05, Z102, and Z.307) were identiBed as the best 
discriminators, and three variables (ZI04, Z202, and Z004) as the poorest. 
Dod.son eharaeterixed ihese, respectively, as test items requiring synthesis of 



rclaiively advanf:ed or seemingly unre!aied maihcniaiical ideas or use of alge- 
braic equauonh. and icsi items requiring liule synihesis and involving rela- 
lively elcmcniary maiheinaiical ideas. 

From ihc analysis of variance, all of ihe cognitive variables except one 
(PZ007 Picture Differences) were signiGcantly related {p < .001) to prob- 
lem-solving performance. In particular, the reasoning cognitive variables were 
better discriminators than the other cognitive variables. Generally, the person- 
ality variables were poorer discriminators between the ability groups than the 
cognitive variables. One of the variables (Messiness) showed no significant 
relation to problem solving, while its counterpart (Orderliness) had a signifi- 
cant negative relationship (/; < .01 ). Only hypotheses were offered regard- 
ing the exploratory search of the teacher data, since data were collected from 
the subjects' eleventh-grade teachers and not from others who might have 
shaped iheir pr{)b!cm-solving perfoimance. Finally, it was reported that the 
school and community variables were poor discriniinators of the ability 
groups. 

In a more limited study, Robinson (1973) tried to identify cognitive 
and affective characteristics of good and poor mathematical problem solvers. 
Initiallv, the following tests were administered to 1 15 sixth-grade students: an 
investigator-developed, 16-item problem-solving test; the Mandlcr-Sarason 
Test Anxiety Scale for Children (Mandler & Sarason, 1952); the Cooper- 
smith Self-Esteem Inventory (Cooper.smiih, 1959); and the Kagan Matching 
Familiar Figures Test (Kagan & Moss, 1962). The Lorge-Thorndike inlel- 
ligerce scores (Lorge. Thorndike, & Hagen, 1966) and the Iowa Test of 
Basic Skills (Lindquist & Hieronymus, 1973) scores in reading comprehen- 
sion, arithmetic concepts, and arithmetic problem solving were obtained from 
ihe school records for these students. Good problem solvers (in the top one- 
third on the problem-solving test) and poor problem solvers (in the boUom 
one-third) were compared on each of the other variables. Next, 10 good and 
10 poor problem .solvers of similar IQ thought aloud as they solved five mathe- 
matical problems, and their problem-solving behaviors were categorized and 
c{>mparcd. Comparison of the problem-solving scores to the other variables 
sh(*wed that gof)d problem sr)Ivers had significantly higher scores f)n IQ, read- 
ing comprehension, arithmetic concepts, arithmetic problem solving, and self- 
esteem, and significantly lower sLf)res on test anxiety than the poor problem 
st)Ivers. There was a significant relationship between problem-solving perfor- 
mance and rellcctivc and impusivc behavior; more impulsive students were 
poor problem solvers and more reflective students, good problem solvers. The 
probability levels of the significant results were not given in the dissertation 
abstract. No significant differences were reported as the result of analyzing the 
interview data. 

The last two studies in this group are concerned with the relation of 
spatial ability to problem-solving performance; hence, they are akin to that 
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condiiricd by Schcmbergcr (Chapter 10). In the firsi, Handler (1977) em- 
ployed an t'X[)('riinrnial sv.i ol t;e<)in(Hric spaiial visualizaiion problems to in- 
vestigate the [)r(>blcm-.solving pnicessrs and spatial visualization abilities of 
competent high sehcml studenis. The subjects were 25 eleventh- and twelfth- 
grade students, each of whom participated in three individual interviews. The 
tirst interview was devoied to collecting personal data and acclimatizing sui)- 
jccts to the experimental procedures, 'f'he second and third interviews were 
used to solve the 10 experimental problems. These f)roblems evaluated the 
variables of spatial visualization, imagination, visual memory, geometric con- 
cepts, and critical thinking, among others. Diagrams accompanied half of the 
problems. Certain exercises were alternately dictated and presented in written 
form to check on the inierferencr efTects of reading with visualization. Solu- 
tions to three of the problems required entirely oral responses; these were tape 
recorded. .Student answe. .,heets and drawings, records of solution procedures, 
overt visualization behaviors, pre- and postsolution subject comments, and 
elapsed times were analyzed. In addition, subjects rated each problem accord- 
ing to difHrulty, (iegree of confidence in their solution, and extent of their etTort. 
The data analysis procedures lor these data were not described. The pr(»cesses 
used by the subjects were classified as deductive, insightful, or extractive. The 
deductive mode predominated; insightful solutions were not observed. Using 
the .Space Relations Subtest of the Diirerential Aptitude Tests (DAT) as a 
measure of spatial visualization, good and (:M)or visual izers were identified. 
•Sizable di.screpancies occurred in the ranks of the DAT subtests and the prob- 
lem set. 

Moses (1978) investigated the nature of spatial ability and spatial 
problems, and the roles they play in mathematical problem solving. Subjects 
were 145 fifth-grade students in four intact classes. All subjects were pre- and 
{)osttested using five tests of spatial .ability (Punched Holes, Card Rotations, 
Form Board, Figure Rotations, and Cul?e Comparisons) (French et al., 
1969a, 19691)) and an experimenter-designed problem-solving inventory. The 
ten problems on the problem-solving inventory represented three types of 
problems, namely, spatial, analytic, and equally spatial and analytic 
problems. Two scores, a problem-solving score and a degree of visuality score, 
were obtained from the problem-solving inventory. After pretesting, two of the 
four intact classes were randondy assigned to the 9- week experimental treat- 
ment which consisted of instruction in percef)lual techniques and visual solu- 
tion processes. (Correlational analyses of the pretest data showed that of the five 
spaiial tests oidy one. C'ube CoInpari.son:^ was not correlated significantly with 
the others (no probal)iIiiy level given), spaiial ability was correlated signifi- 
'canlly with rlie probI(Mn-.s()lving performance (r = .30, /j -< .01 ) and degree 
of visuality ( r = .l7./j -< .05), and problem-solving performance and degree 
of visuality were not significantly correlated. Factor analysis of the pretest data 
showed that four of the spatial tests loaded on one factor while Cubes Compar- 
ison loaded on another. This result confirms the result of the corresponding 
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(.(.rrclaiional analysis as docs a scparalc analysis of elccirocnccphalogram 
( KKC; ) (laia. Analysis of t-ovariante, using ihc prcicst scores as ihe covanaie. 
was applied lo ilic posucsi data lo measure ihe cfTccis of ihe experimenial 
ireaimeni. There were no significantdifTercnccs between the experimental and 
control classes on problem-solving performance or degree of visuality. 
The experimental treatment did significantly increase 
(/J < .10) problem-solving performance on spatial problems, and there was a 
significant increase (/. < .10) in spatial ability. The hypothesis that females 
would gain more from the treatment than males (Fennema & Sherman, 
1977) was not supported. 

Conclusion 

This chapter has described problem-solving studies in mathematics 
conducted from 1969 to 1978 in the United States and Canada. The descrip- 
tion is limited in that only studies similar to those initiated and completed at 
ihe University of Wisconsin-Madison during the same time period are in- 
cluded. Thus, each study meets the following two criteria: (a) the problems 
used in the study were mathematical problems; and (b) the research was in 
the area of instruction in heuristics, measurement of problem-solving perfor- 
mance, or correlates ami factors of problem-solving ability. In the.se three 
areas 31 studies not conducted at Wisconsin were found and are described: 18 
dealt with heuristic instruction, nine with assessment of problem-solving per- 
formance, and four with correlates and factors of problem-solving ability. In 
the following chapters nine additional studies are described. Therefore, from 
1969 to 1978, 40 studies were conducted in the United States and Canada 
which met the two criteria imposed when searching for research reports to 
include in this ( hapten 

This auihor hopes, and believes, thai he has found all of the research 
studies which meet these criteria. However, there is one extant, widely known 
collection of studies which is not described. Those are the Soviet studies of 
mathematical problem solving which have been translated and published in 
thiscooi-.trv (Clarkson, 1975a, 1975b; Kantowski, 1975a; Kilpatrick & Wir- 
szup, 19f:9a. 1969b, 1970, 1972; Krutetskii, 1976; and Wilson, 1975). Cer- 
tainly these reppris have influenced problem-solving research in mathematics 
conducted in this country. In fact, some of the studies described in this chapter 
explicitly cite their use of the techniques employed by the Soviets. Thus, one 
may wonder why this collection of studies is not described here. First, it 
scc-ned more important to describe, as fully as possible, the studies actually 
conducted in the United States between 1969 and 1978; many of the Soviet 
.studies are older than this. Second, the Soviets' concept of the individual and 
individual differences, and their use of .different methods of collecting, summa- 
rizing, and interpreting data limits the ability to use their Bndings without 
replicating iheir'experimcnts in the United States, or to compare their results 
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to similar studies conducted here. '!*hus, it was decided not to include them in 
this chapter. Parenthetically, it should be pointed out that the translated Soviet 
studies do provide an interesting, informative perspective on problem-solving 
research in thai country. Research should be conducted in this country on 
many of these questions. 

In 1969 Kilpatrick (1969, 1970) commented that a good share of re- 
search in mathematics education was being done by doctoral candidates, there 
was an increasing number of methodological blunders, and some investigators 
were apparently ignorant that statistical assumptions were being violated. In 
addition, he stated that, because of our ignorance of mathematical problem- 
solving, clinical studies should be conducted in this area before large-scale, 
complex studies arc attempted. Kilpatrick's remarks are equally true today. 
However, as the 31 studies described in this chapter and the nine which follow 
illustrate, researchers have become moi^e aware of methodological constraints 
and more sophisticated in their use of statistical procedures. It also seems that 
Kilpatrick^s advice regarding the kinds of studies that are necessary and that 
should be undertaken has been heeded as most of the studies between 1969 and 
1978 have be(Mi clinical in nature. Perhaps the time will come when enough 
wil! he known about prol)lem solving in mathematics to attempt larger-scaled 
studies. 
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Chapter 3 



The Small Group Discovery Method: 
1967-1977 

Neil Davidson 

Does a method of leaching mathematics exist which simultaneously 
fosters active learning, thinking, student pacing, and interpersonal communi- 
cation? It seems apparent that the lecture method, instructional television, 
programmed instruction, non programmed self- paced methods, the teacher- 
dirmed discovery method, and the Moore method (Whyburn, 1970) all fail 
in at least one of these functions. 

There is no need to reiterate the familiar arguments for active learning, 
student thinking, and student pacing. However, the inclusion of interpersonal 
communication as the fourth function niay surprise some readers. Interper- 
sonal communication in education can have social benefits as well as enhance 
mathematical learning. Student discussion of mathematics has been empha- 
sized by Buck (1962, p. 563): 

. . . Let me remind you that student-student interactions are also im- 
portant in learning, and that at the professional level, much mathemati- 
cal research springs from discussions betvjeen mathematicians. More- 
over, a test of understanding is often the ability to communicate it to 
others; and this act itself is often the final and most crucial step in the 
learning process. 

On philosophic, psychological, and biological grounds, various authors 
have stressed the affiliative needs of human beings and the social impetus for 
human activity (Dewey, 1916; Montagu, 1966). However, there are a 
number of forces in society and in education which ignore these affiliative 
needs and generate depersonalization, anonymity, loneliness, anxiety, and 
alienation (Association for Supervision and Curriculum Development, 1967; 
May, 1953; Sarnofi*, 1966). Such forces are present in many modern universi- 
ties where many students spend a substantial amount of time as anonymous 
members of mass lecture sections. Interpersonal communication should be em- 
phasized because it promotes student discussion of mathematics, counters soci- 
etal forces toward loneliness and anxiety, and provides personal support in the 
educational process. 

It appears that these goals can be achieved by dividing the class into 
small groups where the students can discuss mathematical problems with a 
few colleagues. The number of students per group is deliberately restricted to 
increase possibilities for personal contact. Small group instruction can foster 
active participation and, to a large extern, student pacing. Moreover, in small 
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group insiruciion, the amounts of discovery and guidance cm be varied, de- 
pending on the desired level of student thi,>king. Finally, smsU group learning 
in conjunction with discovery learning offe.-s possibilities for rarriculum devel- 
opment if differences in student learning ave observed. 

In this study, the subject area of elementary calculus was selected for 
exploration. Positive results in calculus instruction were attained previously 
through the "student experience-discovery approach" of Cummins (19G0) 
and the heuristic problem-solving approach of Larsen (1961). The Moore 
method was used by its developer to teach calculus (see Moise, 1965). Kings- 
bury (1963) made successful use of a self-pacing activity method in calculus 
instruction, and Turn.r, Alders, Hatfield, Croy, and Sigrist (1966) used 
small group instruction, as a supplement to several large lectures in calculus- 
per week. There was no record of a previous attempt to develop a method of 
calculus instruction combining discovery learning_v^ith a .small group ap- 
proach. This combination, named the "small group-discovery method, was 
first employed by the author in a 1967-68 pilot study with a freshman calculus 
class at the University of Wisconsin-Madison. 

The remainder of this chapter is divided into the following sections: the 
design of the small group discovery method using Dewey's educational philos- 
ophy an elaboration of the design based on studies in social psychology, a 
description of the classroom social climate during the pilot study, a descripuon 
of how students interacted with the mathematics content, data for evaluating 
the pilot study, conclusions and questions for investigation, and work com- 
pleted since the pilot study. 

Classroom Practices Derived from Educational 
Philosophy 

George Polya (1903) has emphasized student thinking, active learn- 
ing discovery learning, and interest in mathematics. It has not been widely 
recognized in the mathematical community that these are particular aspects of 
a general philosophy of education and of life, whose foremost advocate in edu- 
cation was John Dewey (1916, 1938).The small groupdiscovery method was 
designed-in accordance with that philosophy. Supporting evidence and further 
elaboration was provided by studies in social p.sychology. A description of 
Dewey's philosophy is beyond the scope of this chapur. However, it does sum- 
marize classroom practices derived from Dewey's philosophy and applied m 
the 1967-68 pilot study with a freshman calculus class. 

During the pilot study, students learned mathematics by doing mathe- 
matics. The approach was one of guided discovery in which mathematical 
topics were introduced as questions to be investigated by the students. The 
students, with limited guidance from the teacher, formulated definitions, stated 
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ihcorems, proved ihe theorems, constructed examples and counterexamples, 
and devel()p<:d techniques for solving classes of problems. 

The classroom activity was a social process taking place in small 
groups. Within each group there was to be a cooperative atmosphere where 
students worked together to solve the problems. 

The teacher adopted a democratic leadership style by participating in 
the students' activities, but not in a highly directive way. The teacher spent 
most of each class period with the small work groups. He kept track of the 
progress of the groups, made corrections and suggestions, clarified notation, 
gave hints, checked solutions, provided encouragement, and tried to see that 
the groups functioned smoothly. 

At times the teacher talked with the entire class, generally for no more 
than 5 or 10 minutes each day. During these brief discussions, new concepts 
were presented, questions were raised and answered, problems were assigned, 
hints were given, clarifications and summaries of student work were made, and 
student discussions were moderated. 

There was no. textbook for the experimental course. To guarantee that 
all'of the major calculus topics were included, the teacher proposed problems; 
this practice departed from Dewey's philosophy. The presentation of mathe- 
matical topics proceeded from the more concrete to the more abstract. The 
discovery of new ideas was emphasized rather than the expression of ideas in 
an impeccable form. Professional standards of. rigor were not imposed upon 
these students, and initial development of ideas was informal in character. It 
was anticipated that the need for increased precision and theoretical security 
would become apparent to the students as they handled more difficult or ab- 
stract problems. 

Since interest in the mathematical content was intended to provide the 
major source of motivation for the students, the teacher attempted to determine 
which topics were of intrinsic value, which ajDpeared to be useful (instrumen- 
tal), and which had little value from the students' perspective. In addition, a 
nonthreatening classroom atmosphere and reduced emphasis upon grades ( the 
use of an A-B grading scale, elimination of in-class examinations, and student 
determination of some grading policies) was employed. 

Skills were developed under conditions where thought was necessary. 
The students developed the techniques for solving each class of problems 
presented to them. Additional practice occurred when problems differed from 
each other and when judgments were needed to find solutions. Whenever pos- 
sible, skills were attained by solving problems of instrinsic value to the 
students. 

Within this basic framework many questions occurred to the students. 
Investigation of student-generated questions .wis one of the important class 
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aclivilies. Daily notes, prepan-d l>y the teacher, contained a record of problems 
solved by the students and of questions which arose. 

Classroom Practices Derived from Social 
Psychology 

Supporting evidence and elaboration for these classroom practices 
were provided by empirical studies in social psychology and group dynamics. 
In a classic study of leadership styles conducted by White and Lippitt ( I960), 
the adult leaders of boys' clubs were trained to be proficient in using authorita- 
' rian democratic, and laissez-faire styles of leadership. Characteristics used m 
defining and comparing the styles included the degree of involvement of the 
leader with the group, the degree of warmth or impersonality of the leader, the 
locus of control and procedures for decision making, the use of orders versus 
suggestions, and the objectivity and frequency of evaluative comments by the 
leaider. 

The authoritarian (autocratic) leader determined all policies, dictated 
icchn.ques and aciivities one step at a time, dictated work tasks and work 
companions, oifcred much iionohjective praise and critici.sm, and gave orders 
and disrupting commanrls. 

The democratic leader helped the group make policies through group 
discussion and decision making, provided an activity perspective a.:-', vetched 
general steps toward the group goal, suggested alternative proceduitj trom 
which group members could choose, allowed members to select work partners 
and to determine division of labor, ofi-ered a small amount of objective praise 
and criticism, provided guiding suggestions when needed, and acted in a 
friendly and equal manner. 

The laissez-faire leader allowed complete freedom for group or indi- 
vidual decisions, participated to a minimal extent, supplied materials, sup- 
plied information upon request, commented infrequently upon members ac- 
tivities, and ofi"ered almost no appraisal of the work. 

White and Lippitt found that the quality and quantity of the work was 
gVcater in the democratic situation than in the laissez-faire situation. There 
was not a clear distinction in terms of quantity and quality of work between 
the authoritarian (autocratic) and democratic situations. However, genuine 
interest in the task was "unquestionably higher" in democrac7 than in 
auuxracy. 

There: wore numerous indicrilions that morale was higher in the demo- 
cr'atic situation than in the autocratic situation. The autocratic groups were 
marked by discontent and a tendency toward group fragmentation. In an ag- 
gressive reaction to autocracy, there was a large amount of hostility; in a sub- 
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missive rciiMicm, the group atmosphere wus subdued and low-spi riled. In boili 
reactions there were submissive and dependent actions tov^ard the adult 
leader. 

Anderson (1963) reviewed 49 empirieal studies defining leadership 
along an authoritarian-demoeratic dimension. Most studies did not inelude 
the laissez-faire style. Thirty-two of the studies deal with leadership in eduea- 
tional settings; the results were not conclusive with respect to measures of stu- 
dent learning. However, morale was generally higher in the democratic 
(learner-centered) groups, except in a few cases involving ''high anxiety 
about grades which are awarded on the basis of final examination 
scores 

On the basis of the research by White and Lippitt (1960) and by An- 
derson (1963), the teacher of the calculus, class in the pre.sent study used a 
carefully specified style of democratic leadership. He provided a perspective on 
each day*s mathematical activities in a brief discussion with the entire class 
and spent most of the period working with small groups. He refrained from 
giving orders or disrupting commands. There was only a minimal amount of 
objective, constructive praise and of critici.sm. Usually criticism was directed to 
I he work group as a whole and not to individuals. The teacher offered guiding 
suggestions at times when they were needed; these included mathematical 
hints and suggestions about work organization and group functioning. He 
sometimes provided technical information upon request, and stimulated self- 
direction by encouraging members to detect group errors and think through 
and elaborate upon their ideas. The teacher developed a friendly, social rela- 
tionship with the students and behaved in an egalitarian manner which in- 
cluded reciprocal use of first names. Finally, many policies in the calculus class 
were arrived at through group discussion and decision-making by a majority 
vote. ' , 

In this study, decisions about cooperation or competition within the 
work groups were made by the teacher on the basis of research conducted by 
Deutsch (1960) at Massachusetts Institute of Technology. Deutsch found 
that the productivity of a competitive discussion group was reduced by poor 
coordination, duplication of efforts, inattentiveness to the ideas of others, ob- 
structive and self-defensive behavior, and group conflict. In a cooperative situ- 
ation, as compared with a competitive situation, the group members were 
more friendly, listened more r.ttentively, and understood the ideas of others 
heller. Moreover, the group discussion was more productive in terms of the 
quantity and (juality of problem-solving ideas generated. In accordance with 
these results, the teacher in the calculus class promoted cooperation within 
each work group by checking the group solution without asking who was re- 
sponsible for it and by not giving individual grades for classwork. He empha- 
sized the need for joint elforts to solve difficult problems, the importance of 
listening carefully and building upon the ideas of others, the fact that one 
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person's good idea helps the entire group, and the goal of solving the prohlcm 
so that all members understand the group solution. 

Studies have shoKn (hat pressure to ''go along with" a group ran lead 
to the modiGcation or distortion of individual judgment or pcreeption (Asch, 
1960). Henee, this conformity pressure and independent thinking are anti- 
thetical to one another. Fortunately, it is [)Ossible to reduce conformity in 
problem solving by developing group standards which encourage members to 
follow their own judgment (Dcutsch & Gerard, 1960). The teacher in the 
calculus class developed such standards by emphasizing the importance of in- 
dependent judgment, the legitimacy of disagreement, and the obligation of 
group membeh to give reasons supporting their statements. The teacher inter- 
vened as a mediator when students looked puzzled or confused or when several 
group members put undue pressure on a dissenter. The teacher emphasized 
the distinction between thoughtless conformity and a change of opinion based 
U()on a thoroughly understood argument. 

A commonly held misconception is I hat every group must have a leader 
(Cartwright & Zander, 1960). In the calculus class there was no clear need to 
appoint a leader for each group. Moreover, there were risks involved in 
designating group leaders, since the opportunity for active participation by the 
followers would then be reduced and since there might be hostility between the 
leader and those who wished to depose him or her. Tliercfore, the work groups 
operated without designated leaders. Although it was not possible to create a 
completely egalitarian work group, it was possible to place limitations upon 
the discrepancy in power between the most active and least active group mem- 
bers. No person was allowed to dominate the discu.s.sion in a manner that ex- 
cluded or severely limited contributions from others. Whenever necessary, the 
teacher influenced the dynamics of particular groups by drawing certain mem- 
bers into the discussion, suggesting that difl'erent people assume primary re- 
sponsibility for writing problem solutions on the blackboard, and using other 
techniques to promote cooperation. 

It was necessary to maintain small work groups, since the opportunity 
for active participation would decrease as group size increased. There was 
some empirical evidence available of the eff<rts of group .size on group interae- 
lion in nonmaihemaiic al discussions. ^Iwo-person diseus.sion groups were 
found to l)C marked by a tense, cautious atmosphere in which the members 
lended to avoid conflict and expression of their ideas. In two-person groups 
ibcre was no one to resolve differences, and either member could bring the 
group to a hall by disagreeing or withdrawing (Bales & Borgatta, 1961). 
Three-person groups tended to break up into a pair and an isolated member, 
p'our-person groups could split into two subgroups of equal size and thereby 
produce a protracted argument or deadlock (Bales & Borgatta, 1961; Mills, 
I960). Five-person groups entailed the dangers of competition, exclusion of 
members from the discussion, and the need for a definite leader (Slater, 1958). 
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The experimenial evidence was sufficicnl to mic oul the two-incinhcr i^mup 
and the five-member group in an effort to achieve active student panicipaiion. 
There was no clear case for selecting either the three-member group or the 
four-member group, so ihe teacher simply chose the four-member group for 
the pilot study. It was not clear that'the teacher could give adequate attention 
to more than three groups during his first trial of the instructional method, so 
the class for the pilot study was limited lo 12 members. 

Kilpatrick (1969) and Symonds ( 1958) have reviewed studies of the 
detrimental effects of grades and anxiety problem-solving performance and 
creativity. Moreover, the grading problem has presented ''the greatest obsta- 
cle" to the success of some past attempts at student-centered instruction 
(McKeachie, 1954). These observations provided support for the philosoph- 
ically based decision lo use a permi.ssive grading scheme for the calculus clas.s. 

The Classroom Social Climate During the Study 

For the first trial of the small group di.scovery method, the teacher set 
some entrance requirements for the students. In order to join the class, a stu- 
dent had to be a freshman or sophomore with little or no prior knowledge of 
calculus, have grades of A or B in high school mathematics, and be at least 
mildly interested in mathematics. Students were selected for the class through 
interviews with the teacher at the course assignment committee. Only one stu- 
dent who was interviewed decided not to join the class. The pilot class con- 
sisted of four female and eight male students; there were 1 1 freshman and one 
sophomore. 

Leadership by the Teacher 

The teacher made use of a democratic style of leadership during the 
pilot study. Many policies in the class were determined through group discus- 
sion and decision making by a majority vote, with the teacher serving as dis- 
cussion moderator. The students decided the membership and division of labor 
in their groups and the time schedule for changing membership. They selected 
take-home exams from a set of 1 1 alternative grading policies, permitted each 
student to begin work on the exam at a convenient lime during a one-week 
period, and decided not to make up definitions or terminology which would 
conflict with standard mathematical usage. 

The teacher gave a perspective on each day*s mathematical activities in 
a brief discussion with the entire class. He often introduced new topics in the 
form of questions for investigation by the students, such as: "How can we find 
the area under a curve?** 'What happens at a high or low point on a curve?" 
*'VVhat can you say about a function which vanishes at the end points of its 
interval of definition?'* '"Can we find a formula for the derivative of a prod- 
uct?*' *'How can we find the volume of the solid obtained by revolving a curve 
around an axis?" j 
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Almost all maihcmaiical discusMons wiih llic cnlin' class lasted lor less 
than 10 miniiu's. 'Vhc discussions -.-i the static for the main activity of uniall 
group problem solving. Just enojgh input was provided in discussions so that 
the groups could function productively for the rest of the class period. 

The lead r usually used praise and criticism in an objective, construc- 
tive manner and .reeled ii to the whole work group. However, ihere w: re two 
examples of personal criticism du. ..g the year. In one situation, the teacher 

said to someone. " you still don't know y(mr derivative formulas." She 

immediately IVo/c up and was less friendly to the teacher for the next week, 
during which she participatetl less than usual. In the oilier siiuation, the 
teacher said lo a student, "Why don't you h)ok al the solution and point out 
your mistake/* The sarcastic reply was. *'VVell now, if I'd known it was a 
mistake I wouldn't have done it. would !.^" 

The teacher almost never gave orders or disrupting commands. In one 
exception to this, the teacher slopped the groups in the middle of proving the 
chain rule and told them to think aUiut it overnight. While some students were 
relieved, others ex[)rcssed resentment; "I sur? hate to get cut off in the middle 
ol a problem." On several (nrasions the teacher asked the groups lo slop work- 
ing near the end of the period so that he could present a summary. It quickly 
became a[)parent that students did not care to have a summary at the end of the 
period. They kept on talking about the problems, and several students stated 
lhai they knew what they had done and required no further reiteration to 
understand it. The practice of end-of-period summaries was rapidly 
abandoned. 

The teacher found it easy to keep track of group progress, since stu- 
dents wrote their problem solutions on the board. He frequently did not wait 
for a request for assisuince. but offered suggestions at times when they ap- 
peared to be needed. Usually, a visit with a particular group took less than 1 
minute. .Sometimes a visit lasted only 10 or 1 5 seconds>-for example, if it was 
only necessary to point out an arithnictic mistake, ask the reason for a step, or 
cheek a .simple solution. However, on diflicult proofs thegrou[)S needed consid- 
erable assistance, and visits to grouos lasted 2 or 3 minutes. If the teacher 
stayed too long with one group, menibers of other groups began calling for 
help. 

Guiding suggestions of a mathematical nature were given in the form 
ot hints. s(»metimes using the heuristic techniques of Polya (1965). Here are 
st»nie examples: 

K The teacher frequently asked the students to concentrate on the 
given data, the desired result, and relationships between the two, This helped 
in many pr(M)fs. es[)eeiallv those using the definitions of the limit. 

2. The teacher sometimes suggested that groups attempt to use prior 
results and to reason by analogy. For example, when students had trouble 
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deciding whether a certain function had no limit or two limits at a- point, the 
teacher suggested a comparison with sequences such as 2, -2, 2, -2, ... > 
where the same insuc had been settled previously. 

3. It was *r-';:iisi()nally helpful to suggest that students consider a sim- 
ple instance of a general problem. This was done with n ^ 2 and n = 3 in 
guessing the formula for the derivative of a product of n functions. 

4. General results were sometimes formulated by considering special 
cases. The students correctly surmised the fundamental theorem of calculus 
after computing 

J xMxA= 1.2, 3. 
a 

5. It was sometimes useful to suggest that the students discover or con- 
firm results by drawing pictures. This was suggested when students could not 
remember if e/(sin x)/dx is cos x or -cos x, 

6. The suggestion to guess the answer to a problem sometimes led to 
some sirprises. Students were convinced that the derivative of a product 
.should turn out to be the product of the derivatives. 

7. A slight shift in notation occasionally made a big difference in prob- 
lem solving. In their first encounter with implicit differentiation, students had 
great diflBculty in finding dy/dx for Jt' + / = 1. The hint to replace y by /(x) 
rcadiiy enabled students to &i\df'(x). 

8. There were occasions when a hint given only once, lasted for the 
remainder of the year. In the proof of the formula for the derivative of a prod- 
uct, »hc hint was given to add and subtract the same term. For the rest of the 
year the students correctly used this technique when needed. 

The teacher sometimes offered guiding suggestions with respect to the 
work organization and functioning of a particular group. Students often wrote 
four or five attempted solutions all over the board, and no one could tell where 
one idea ended and the next began. Many students omitted key symbols — for 
example, writing sin = cos x instead of t/(sin .v)/^/a: = cos a: or J cos a t/x^ 
sin X + c. This caused great confusion on complicated problems, and sugges- 
tions about blackboard technique were much needed. Suggestions about the 
social functioning of the groups arc described later. 

The teacher provided technical information on request if the develop- 
ment or recall of that information was not a key part of the problem at hand. 
For example, a request was always honored for an approximation of the 
number ti) five decimal places. A request was never honored to provide ihc 
formula for r/( (/ fx) ^^)/dx. Other items of information, for example an idcn- 
nty for cos ?P. wei c provided for some pioblcms but not others. 

The teacher checked the group .solutions oi:* all the difficult problems or 
theorems. In other problems, checking preferences 'aried. Some group mem- 




bers always wanteql their solution checked; other group members were quite 
confident and erased their solutions without teacher checking. When enough 
board spac^ was available some groups left one solution up for checking while 
working on another problem. 

The teacher attempted to stimulate self-direction l>y encouraging peo- 
ple to look for errors in their group\s solutions. Many errors were caught by 
the students themselves, and others were delected by the teacher. There were 
computation errors, incorrect applications of basic formulas [rJ (sin 3x)/ dx = 
cos 3x\ , errors in basic algebraic facts, logical errors of many types ( e.g., circu- 
lar reasoning and proving a conclusion without using the hypothesis), errors 
forever-generalization \d(e^) /dx ^x e^'^l ,SLnd errors nuiimon = 
.v^ then /' (x^) = 3x^\ . The teacher was surprised by the students' frequent 
shifts from error lo insight. 

Although all groups began each new topic on the same day, some 
groups moved more quickly than others. The teacher always had some chal- 
lenging extra problems for groups which finislied early. 

The teacher developed a friendly relationship with the students and 
tried to reduce the gap in status between the students and himself. He often 
arrived a few minutes before class to chat about campus events, world happen- 
ings, and so on, hut saved personal problems for di.scussion outside of class. He 
suggested a mutual first name basis, which made some students uncomfortable 
at the start of the year: "Are we supposed to call you Profes.sor Neil or Dr. 
Neil, Mister Neil, or what?** This issue dissolved after a few weeks. 

Dittoed notes were prepared by the teacher after each class meeting to 
r cord the students' accomplishments on that day. The notes were distributed 
at the Following class meeting. Although the teacher had expectations about 
the material to be covered on any given day. these expectations were wrong 
more often than not. Students frequently encountered unexpected difEcuhies, 
came up with novel problem solutions, or pursued questions not phmned by 
the teacher. 

Group Formation and Size 

The students decided to change groups every 2 or 3 weeks or at the end 
of major units of content. The proce^^s of changing groups was awkward, 
though brief. Students had different styles of coping with the change process. 
Some .said directly, "Let*s work together." One girl always went straight to 
"her corner" and wailed for others to join her. Some students sat and pre- 
tended to do homework until the groups were basically formed; lUcn they 
looked around for a vacancy. One j>ers()n sometimes wandered arojnd the 
HMim I(Kiking lost until setding upon a group. Despite this awkward process. 
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the students refused a suggestion by the teacher to form groups by writing 
down confidentially &*'^r most preferred and least preferred group members. 

During the pilot study there were lour members in each work group. 
The four-member group functicmed well on theoretical problems, but was 
sometimes too large for optimal practice with standard computation problems. 
In computational problems the groups often split spontaneously into two 
pairs. When one member was absent the remaining three functioned well ex- 
cept when the problems were very difficult or the missing member normally 
exercised much leadership. On rare occasicms when two group members were 
absent, ihc remaining pair either limped along or split up and joined ditferent 
groups for the day. 

Cooperation ' 

The teacher fostered (:(M)perati()n and discouraged competition by talk- 
ing with group members. Here arc some examples of teacher comments m;ide 
in various situations: ''Some of these problems are very hard and you have to 
work together to y^.Avr them quickly.'* ''There is no need to blame anyone for a 
mistake.*' "Mow ai^m listening? Are you really disagreeing or just saying the 
same thing in different words?'* "Is it possible that you're both right?" "The 
group goal is not only to solve the problem but to do so in a way that everybody 
understands.'* 

Roughly two-thirds of the students were cooperative, but at least three 
individuals believed in and practiced competition. When two particular com- 
petitors were in the same group they argued intensely, tended to exclude the 
other two merrhers from the discussion, and were impatient in answering their 
questions. \\ ta ihic pattern became clear, the teacher asked the two competi- 
tors to work in diffcreKi groups. On many occasions, a cooperative „roup pro- 
ducing a small number of ideas solved problems as quickly as a competitive 
group wliich gen^^r^ted many ideas but could not agree upon them- 

The work groups usually functioned as separate units with little com- 
munication between them. Members of one group almost never borrowed 
ideas from the solutions of other groups. On numbered lists of problems, peo- 
ple sometimes compared the problem number, hut not the .solution, l)etwecn 
two groups. Competition between groups occurred s|M)ntaneously. but very 
rarely, and only on numbered lists of rather easy problems. When competition 
an>sc between groups, it was done in the spirit of a lively game which no one 
t(K>k seriously. 

Leadership by St .^dents 

Although ih<- ^.^oups operated without designated leaders, some group 
members were much lore active and influential than others. For the entire 
year, one girl emerged quite clearly as the task leader on most problems no 
matter who else was in her group. No one else was able to match her consistent 
quickness and enormous enthusiasm. There were interaction difficulties in 
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some of her groups at the beginning of the year, but these disappeared as 
people deliberately chose to work.vvith her or to avoid her group. At the other 
extren?e was a boy who was always the least active and the least influential 
member of his group. He rarely made any mathematical suggestions or wrote 
any solutions on the board. However, when he did contribute an idea or an- 
swer a question^ he was almost always correct. 

The behavior of the other group members fell between these extremes. 
More than half of iU\ students were very active particpants in problem solving 
througout the year. Some individuals participated to a greater or lesser extent, 
depending on who else was in their group. Sometimes, a less influential group 
member had the basic idea for a solution, but the idea was either not heard or 
not accepted. When the teacher came over and made the very same suggestion, 
the person with the idea blurted out: "See, that's what I was trying to lell you 5 
minutes ago!"' 

Students were very reluctant to criticize dominant behavior, even when 
it clearly interfered with group progress. The system of taking turns to write 
down the problem solution helped lo .some extent, especialy on numbered lists 
:;r w)blems. Another idea was for difi'erent group members to become ''ex- 
perts'' on certain problems, solve them outside of class, and present them to 
ihcir group. This approach turned out to be a disaster and -.vas quickly 
abandoned. 



Conformity 

Conformity pressure in the groups had to be reckoned with, but was 
not a serious difficulty. For example, on various occasions the teacher beard 
someone say, ''I suppose we've solved the jirr.blem but 1 don't see why it 
works." Another student rcpliwl, "Never mind why it works, it just does. Let's 
go on." A quick teacher intervention to check understanding resolved the issue. 
There were only a few incidents in which three group members put pressure 
on a single dissenter. In one case there was an intense discussion of the relative 
merits of approximating areas by rectangles or by little squ;ires. The dissenter, 
who favored squares, became visibly upset. The teacher then intervened, 
jK)inting out that both sides were right for diiferent purposes, and the approach 
with squares would be used in a later semester. 



The Interaction Between the Students and the 
Mathematics Content 

Through daily a)nversations with the students and observations of 
their work, the teacher learned much about iheir reactions lo the subject mat- 
ter. The students developed problem-solving techniques and proved the major 
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theorems as expecicd but affective and cognitive aspects of the student inicrar- 
tion with the content were sometimes surprising. The close contact with stu- 
dents in the study groups helped ihc teacher gain much insighl into student 
perceptions of calculus. 

During the pilot study, most students had difficulty with testing univer- 
sal statements by particular instances, recalling definition.?, and transferring 
information from one problem to another. The students frequently did not test 
incorrect universally quantified statements by using specific instances. Exam- 
ples of this included the incorrect formulas 1 + sec^ 6 = tan^ 9, cos (.v + y) = 
(cosx)(cosy) + (sin x) (sin ( sec x ) /^.v = tan^ x. Each time the students 
wrote down such an incorrect identity, the teacher asked if there was any easy 
way to test the truth of their statement. It was usually necessary to tell the 
students to try their statement with a particular value of the variable. 

The students were apparently not used to thinking in terms of defini- 
tions, and they tended to forget major definitions from one day to the next or 
even from one problem to the next. Most students were persistently unable to 
recall definitions of the limit, the definite integral, and continuity. The defini- 
tion of the derivative fared somewhat better than the other definitions, perhaps 
because it was a simple formula which was frequently used. The students 
often tended not to use the major definitions, even in problems where use of the 
definition provided the only possible approach. For example, the students did 
not think of using the definition of the integral to test the integrability of the 
following function: /(x) = 0 if x is rational, /(x) = 1 if x is irrational, where 
0 < X < 1. 

There was a noticeable tendency for the students to treat all problems 
as separate and unrelated entities. For instance, the groups first evaluated 
Jflx/(n^ + x^) by means of the substitution x = a tan 9. Instead of using this 
result, the groups then evaluated J'/x/(9 + x^) by the substitution x = 3 tan 9. 
Although .some repetition can be a useful aid in learning, many students re- 
peated the same useful work over and over again, long after they had mastered 
the appropriate integration technique. 

In working with derivatives of composite functions, most students did not 
perceive the need to apply the chain rule in new situations. Although the stu- 
dent groups correctly d<*veloped the formula for the derivative of each major 
new function, they then made erroneous statements such as d (sin 2x)/dx = 
cos 2v, d (e^^) /dx = e^, d (In 4x)/dx = l/4x. In each instance it was neces- 
sary for the teacher to remind students that they were dealing with composite 
function. 

In problems of integration almost all students had persistent difficult 
lies in working with the differential. For example, in evaluating 
J (sin'x) (cos x) dx, the groups let u = sin x and evaluated } (u^) (cos x) dx 
without expressing all of the integrand in one variable. Later the groups used 
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ihe relaiion of u = sin .v lo derive ihc incorrm expressions rlu = cos x*, or 
flu/clx = (tos' x) dx. In many problems wilh iniegraiion by subsiiiuiion or by 
parts, the students first forgot to convert all the variables in the integrand. 
Then, when they tried to do so, they frequently ended up with either too many 
or too few differential symbols. 

In problems which could be solved in several different ways, the stu- 
dents often preferred to use the technique they had learned first. For example, 
the integration technique of trigonometric substitution was first introduced by 
the problem of computing the area of a circle. To evaluate 

the groups used polar roonimates and set a: = r cos 8 . Then, in many other 
integrals involving ihc expressions - .v^, the students always used the substi- 
tution X = r cos 6 , niihcr than the more .standard substitution -v = r sin 8 . 
Several students said that .v = r sin 8 would work, but that they liked their first 
approach better. 

The students' intuitive notions about sequences were surprising to the 
author. Almost all the students believed initially that the listing 1, 1, I, i, . . . 
did not describe a sequence, since the n-ih term did not change and was not 
speciEed by a formula involving n. After resolving this issue, almost all stu- 
dents stated that the sequence 1,1,1,1, ... did not have a limit, since 'Vs 
not getting close to any number; it's there already." 

Moststudentsstatedthatthe'sequencc, 0, 2, 0, 2, 0, 2, . . .converged 
to two limits, and were upset when the teacher said the sequence had no limit. 
Their discomfort -was alleviated somewhat when the teacher introduced the 
notion of a subsequence. 

In trying to solve problems or do proofs using the definition of the limit 
of a sequence or a function, the students encountered great conceptual and 
technical difficulties. Comments from several students indicated that they did 
not perceive the statement as a reasonable definition. *if that's a definition, it's 
the weirdest one Tve seen in my entire life." Moreover, most students did not 
find the pr(K)fs of limit theorems useful. "I'here's no reason to prove a theorem 
unless there is some doubt about the result, and I never had any doubt about 
the sum of the limits being the same as the limit of the sum." Many students 
were not convinced by proofs of the limit theorems. "That proof is nothing but 
a bunch of equivalent statements with complicated notation. It doesn't prove 
anything to me." These attitudes and difficulties were not caused by lark of 
prior concrete experience; the students had spent .several weeks working with a 
variety of sequences before encountering the formal definition of the limit. 

The student concept of a function seemed to include several basic but 
'unstated assumptions. Students invariably drew the graph of a function as a 
smooth curve with a small number of relative maxima or minima. A student 
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said, oncl others agreed, ihai ''there are only three possibilities at an cndpoint 
of an interval. Either the curve comes in level or it comes from below or from 
above/' It appeared that the student concept of a function on a closed interval 
actually meant a continuous, differcntiablc function with a fmite number of 
maxima and minima. 

Students were almost always unable to state or recognize the definition 
of continuity. In many problems they automatically used the properly 
; \\mJ(x)^J(xo) 

without asking whether the property held for the given function. Morever, 
students tended to assume the existence of absolute maximum and minimum 
values for any function defined on a closed interval. Most believed that there 
was something unnatural or artificial about functions with discontinuities. As 
they put it, these functions were "made up" by moving points out of their 
proper location, adding points which did not belong in the domain, putting in 
steps, or creating infinitely many oscillations in the graph. 

The students seemed to think at times that all functions were dif- 
ferentiable. For example, for the function / (x) - \x\, the students stated that 
they were going to find /' (0). When the right- and left-hand limits of the 
difTerencc quotients turned out to bedifTcrcnt, most students thought they had 
made an arithmetic mistake. Moreover, the students almost always reversed 
the relationsip between difi'erentiability and continuity and stated that all con- 
tinuous functions were difTerentiablc. 

Many students made a distinction between theory and problems. As 
one student put it, "Calculus should be 25% theory and 75% problems." The 
distinction between theory and problems seemed to depend largely on the pres- 
ence or absence of arbitrary functions. Although most students preferred 
problems over theory, ihey sometimes distinguished between useless theory 
and useful theory. Useless theory consisted of propositions intended to "prove 
the obvious" or "straighten out things we already know." Most students 
deemed as useless the definition of the limit and the development of the natural 
logarithm as an integral. Useful theory consisted of general propositions 
which had applications to interesting problems with specific functions. Many 
students accepted as useful theory the proof of the fundamental theorem of 
calculus and the development of the formula for the volume of a surface of 
revolution. 

'I*hc diffiailties encountered by better-than-averagc students in n dis- 
covery approach to calculus were quite surprising, even to an experienced 
irarhcr nl calculus. Hopefully, these difficulties will not obscure the success of 
student groups in [)roving the major theorems of calculus, developing tech- 
niques for solving classes of problems, stating insightful conjectures, and com- 
ing up with problem solutions and proofs not previously known to the 
teachers. 
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Data for Evaluating the Pilot Study 

Comparison between Tv/o Methods of Instruction 

As pan of the evaluation for the pilot study, a final examination was 
given to students in the small group discovery class, or ''discovery group*' and 
to a control group. The control group consisted of 51 students who learned 
calculus via the lecture-discussion system. The 51 students were a subset of a 
lecture class which met with a professor for three lectures per week. On the 
other 2 days, the 51 students met in four separate discussion sections ^ed by 
four different teaching a.ssistants. 

There were differences between the two groups with respect to group 
composition and conditions pertaining to the examination. The 12 members of 
the di.scovery group were all volunteers for a special class; all of these students 
had grades of A or B in their high school mathematics cla.sses. The students in 
the contr(»l grou[) were not volunteers and had not been subject lo any special 
entrance requirements. In comparing the two groups, there was no attempt to 
control such variables as SAT scores, IQ scores, sex, or grades in high school 
mathematics. 

The final examination was administered at the end of the second se- 
mester of the 1 -year pilot study. During that year, the students in the discovery 
group had taken no examinations or quizzes in class. Students in the control 
group had taken a final examination during the lirst semester, several hourly 
examinations during both semesters, and quizzes at the discretion of the vari- 
ous teaching assistants. 

The discovery group took the final examination designed for the con- , 
trol group. The examination involved the recall of facts and standard compu- 
tation skills; it did not require the solution of any difficult problems or the 
formulation or proof of theorems. The examination did not include material 
such as limits of sequences which was covered in the discovery group but not in 
the control group. However, the 23 items on the examination did include seven 
items covered in the control group but not in the discovery group; the members 
of the discovery group were told to omit these items. Thus, the members of ihe 
discovery group had more time available during the 1-hour test. The compari- 
son was made on the basis of the 18 items common to both groups; the exami- 
nations of the discovery group were scored by the author. 

The raw scores, mean, median^ and standard deviation on the final 
examination are presented in Table 1. The mean and median for the discovery 
group were 55.25 and 55.5, respectively. The mean and median for the control 
group were 52.35 and 53.0. On a section-by-section basis, the mean and me- 
dian were higher for the discovery group than for the four sections in the con- 
trol group, with the exception of one section in which the median was the same 
as for the discovery group. In addition, the standard deviation was 7.59 for the 
discovery group and 1 1.57 for the control group. An F-test was run on data, 
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Table 1 

Scores on the Final Examination 



Control 



Group 


Discovery 


1 


2 


3 


4 


All control 
groups 


Number of 


12 


12 


16 


12 


11 


51 


'students 














Total of 
scores 


663 


662 


664 


601 


543 


2,670 


Arithmetic 
mean 


55.25 


55.17 


54.00 


50.08 


49.36 


52.35 


Median 


55.5 


55.5 


50.5 


51.5 


51.0 


53.0 


Standard 
deviation 


7.59 


7.00 


11.86 


14.66 


11.77 


11.57 



yielding an /^-raiioof .679 (/; < .413). Hence, the diffeience was noi siaiisti- 
cally signiBcant. 

The author tabulated the number of perfect solutions achieved in each 
group for each ilciri of the test, excluding those items which were omitted for 
the discovery group. PVom these data the average number of perfect solutions 
per student in each group was computed. The average number of perfect solu- 
tions per student in the discovery group was 12.50. The corresponding num- 
bers for the four control sections and the total control group were l2-25» 12,38, 
10.92, 10.91, and 1 1.69, respectively. Therefore, the average number of per- 
fect solutions per student was higher in the discovery group than in each con- 
trol session taken separately and in the total control group. 

Take-home Examinations 

In the discovery grouf) there were seven take-home examinations dur- 
ing the pilot siudy — three during the first semester and four during the sec- 
ond semester. The resub.s of these examinations, with scores converted to a 
100-point scale, are summarized in Table 2. 

On Bve of the seven take-home examinations, the means and medians 
were higher than 80. On the remaining two examinations, which were quite 
difficult, the means and medians were higher than 70. Among the 84 individ- 
ual scores on the examinations, only 10 scores were lower than 70. The abso- 
lute niininiinn score was 60, which occurred only once and on the first exam. 
No member of the class was the low scorer on more than two of the seven 
examinations. 
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Table 2 

Results of the Take-home Examinations 



Exam number 


1 


2 


3 


4 


5 


6 


7 


Mean 
Median 
Standard 
deviation 


72.6 
74.0 

7.90 


87.0 
86.5 

7.97 


72.6 
72.0 

6.30 


80.8 
82.0 

9.16 


86.6 
86.5 

7.53 


88.8 
91.0 

6.38 


90.5 
90.5 

8.92 



The Questionnaire 

A 90-iiem open-ended questionnaire was used to determine student 
reactions to the small group discovery class. In constructing the questionnaire, 
it was decided that open-ended items might provide a good deal of information 
not readily available in another form, although such items would be difficult to 
classify and count. T!ic questionnaire was loosely constructed and no claim 
was made about its reliability; it was not suitable for use in a carefully con- 
trolled investigation. Nevertheless, for an informal evaluation of the Grst trial 
of an instructional method, it furnished the necessary information. 

The questionnaire wa.s divided into Gve major sections, dealing with 
the work groups, the mathematics content, the teacher, various practices and 
policies, and basic reactions to the class. The questionnaire was given to the 
students about 2 weeks before the end of the semester. For almost every item 
on the questionnaire, the student responses were classified into various catego- 
ries and counted, although for certain items it was not possible to form very 
neat categories. Since the flavor of the student responses cannot be conveyed by 
an item summ^^ry, there follow some quoted responses to one question. 

How did working with other students influence your learning? 

Categorized Responses Frequency 

a. Uncertain 2 

b. Positive Eflferts 10 

Sample Responses 

a. It is very difficult to judge. 

b. Other students, no matter who, force you to learn more. 

It helped me because I gained confidence showing people how to work a 
problem, also realized my limitations when people showed me how to do a 
problem. ' 

A lot of times when I did not understand something the other members of 
the group helped to clear things up. 

The working out of problems together not only removed much of the frus- 
tration of working difficult ones by oneself, but it also helped keep up a 
constant renewal of interest. 
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I learned to depend on working il out myself or with the help of others 
instead of relying on a book. In other words I think I developed a little 
original thinking. 

I think I learned a lot more this year than I did in all 3 years of high school 
math. 

I think you learn from students while you're taught by teachers. I think you 
know what I mean. With a student you understand, with a teacher you too 
willingly accept. 

Summary of Major Results from the Questiomiaire 

• Two-thirds of the class members cither did not enjoy the theory or 
enjoyed it only sometimes. 

• No student reported a decrease in interest in mathematics during 
the pilot study, and one-third of the students reported an increase in their 
interest in mathematics. 

• No student reported a decrease in skill in problem solving during 
the pilot study, and more than half of the class members believed that there 
was an increase in their problem solving skill. 

• Most of the students said that working with others had positive ef- 
fects upon their own learning. 

• Only one-fourth of tlie class members were never concerned about 
coveriniv -^'^ " --"^^ material during the pilot study. 

• ihe students reported that the teacher spent the right 
amount of time with each work group and gave enough hints. 

• More than half of the students said that the teacher was effective in 
giving hints, and the other students said he was sometimes effective. 

• Everyone perceived the teacher more as a helper and guide than as 
an evaluator and critic. 

• Almost all the students reported a closer, more personal relation- 
ship with their mathematics teacher than with their other teachers. 

• All the students believed that the two-member group was too small 
and the five-member group was too big. There was a division of opinion about 
the relative merits of the three-person group versus the four-person group. 

• Three-fourths of the class members expressed a desire to avoid 
working with various individuals. 

• More than half of the class members were sometimes in a group 
with a person they didn't like. 




• Two-ihirds of ihe class members reporicd feeling compiciely free lo 
ask questions when ihey didn*i understand someihing. The other'people fei: 
very free, bui this depended on ihe person being asked. 

• Three-fourths of the class menibers said that they had an adequate 
opportunity to express their ideas in their groups. The other people said it 
depended upon the particular group. 

• Almost half of the class members reported competing with others. 

• While some of the groups functioned very well, othcr.s did not.. 

• Most of the class members said that working problems every day 
did not become routine or monotonous. 

• Three-fourth.s of the student.s .said that they never read a caiculus 
b(H}k duriDg the pilot study. 

• Three-fourths of the .students refx>rtecl feeling little or much less- 
grading pressure than in their other classes. 

• Half of the class members saw other class members socially. 

• Most of the students said that their calculus class was better, more 
stimulating, or much more stimulating than their other classes. 

• More than half of the class members said that their attitude toward 
the class changed for the better during the year, and the other students said 
there was no change in their attitude. 

• Seven students said that they would have no reservations about tak- 
ing future courses taught by this method. Five people expressed reservations, 
which were the following: doubt about learning as much as in ordinary classes, 
the great dependence of this method upon the particular instructor, the desire 
to try a lecture-quiz course in m-->ih, the wish to avoid contact with other peo- 
ple, and fear of not being able to make it in a regular cla.s.s. 

• The students mentioned the following advantages of the method: 
less grade pressure, more interesting, easier, more opportunity to clear up 
questions, more fun, more challenging, and greater student-teacher contact. In 
addition, it builds good relationships with others and stimulates desire to learn 
maih, gain ideas from other people, learn to teach, get more help, learn more 
thoroughly, develop greater understanding, think for oneself, and think 
creatively. 

Conclusions and Suggestions for Investigation 

The pilot study demonstrated that an entire- first-year course in 

calculus can be taught by the small group discovery method. The student 

groups succeeded in provkig the theorems of calculus and developing tech- 
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niques for solving various classes of problems with only limiied guidaruc f>y 
ihc leacher. 

An infprmal comparison, not employing a formal experimental design, 
was made of student achievement in the small group discovery class and in a 
lecture-discussion section. On the common items of a final examination deal- 
ing with basic facts and computational skills, the small group discovery class 
performed at least as well as ihc lecture discussion class. Inspection of the 
syllabi for the two classes showed that the small group discovery class dealt 
with as much material a's the lecture section, but not with exactly the same 
material. Finally, the students in the discovery class performed very credibly 
on seven nontrivial take-home examinations. 

A 90-item open-ended questionnaire was given to the students in the 
discovery class, with the following general results. On the negative side, the 
students did not find certain mathematical topics lo be either interesting or 
useful. Most students were concerned for varying periods of time about cover- 
ing enough material. Group conflict or frustration sometimes occurred, espe- 
cially when the mathematical problems were too hard. The formation of cfi*cc- 
tivc and satisfying working groups was rather difficult for the students. 

On the positive side, the pilot class had cither positive or nonncgativc 
efi*ects upon each student's interest in mathematics and estimatr of his or her 
problem-solving skill. Most students believed that woiking with others had 
positive efi*ects on their own learning and that working problems every day did 
not become routine or monotonous. Almost all of the students had a closer, 
more personal relationship' with their mathematics teacher than with their 
other teachers, and half of the class members saw other class members socially. 
Most of the students found their calculus class more stimulating than their 
other classes, and the attitudes of all the students toward the class either stayed 
the same or improved during the year. 

A number of students asked for an extension of the class for another 
semester. The Department of Mathematics consented to schedule a third se- 
mester continuation for the small group discovery class. 

On the basis of the evidence, it seems fair to conclude that the pilot 
study was a successful first attempt to implement the small group discovery 
method. A number of questions will now be presented for future investigation. 

Questions for Further Investigation 

This chapter describes the development and initial tryout of a new 
mathematics instructional system designed to foster active learning, thinking, 
student pacing, and interpersonal communication. A numbci* of the questions 
that arose as a result of the pilot study fall into three broad areas. The three 
areas are concerned ba.sically with mathematics questions, further develop- 
ment of the small group discovery method, and the applicability of the method 
U) diffcrcnl sludcni populations. These questions are listed here. 
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• Does '^.'.le small group discovery meihod increase siudeni interest in 
mathematics? 

• Does the small group discovery method increase student skill in 
solving mathematical problems? Is that increase greater when the teacher em- 
phasizes the use of heuristic techniques? (Kditors note: A recent study by 
Loomcr (1976) does not support an affirmative: answer to these questions.) 

• Can mathematical creativity be fostered through participation, over 
an extended period of time, in a small group discovery approach? 

• Is it possible to develop written curriculum'materials in calculus for 
small group instruction so that i.tudents will perceive most topics as being in- 
teresting or useful? Or, in any method of instruction, will calculus students 
find some topics (e.g., proofs using the definition of the limit and the develop- 
ment of the natural logarithm as in integral) uninteresting, not useful, and 
difficult? 

• Can calculus students improve and feel comfortable with continu- 
ous functions only if they havr extensive prior experience with noncontinuous 
ones? 

• Can a mathematical topic be understood by average mathematics 
students using expository instruction and by high-achieving mathematics stu- 
dents using a small group discovery approach over equal periods of time? 

• If a mathematical topic cannot be developed by high-achieving 
mathenjatics students using a small group discovery approach, can that topic 
be underst(K>d by average students using an expository approach? 

• What are the differences in understanding and the time required to 
develop that understanding between equivalent groups of students who are 
instructed by an expository . approach and by the small group discovery 
method? 

• Can a small group discovery method be used in mathematics in- 
struction at all school and collegiate levels? If so, what would the effects of this 
technique he upon mathematics learning and the quality of interpersonal 
relationships? 

• When the small group discovery method is used with college stu- 
dents, what is the optimal length for class meetings? 

• What is the opitmal size of a work group in a mathematics class? 
Does the optimal size of a work group vary wiih the type of thinking or skills 
rc(|(jire(l to solve a problem? 

• What is The optimal size of a mathematics class taught by the small 
group discove y method? 
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• Whai pmcedurcscan be developed lo facilitate the formation of ef- 
fcciivf ;iiul satis^yiIl^ work groups? 

• What are the ways to improve group functioning and interpersonal 
relationships in a small group discovery method mathematics class? 

• What grading systems are especially well suited to the sm^ll group 
discovery method? 

• What types of students are best suited for learning using the small 
group discovery method? 

• Can the small group discovery method function positively for stu- 
dents who wish to change iheir interpersonal style of behavior (e.g., their abil- 
ity to cooperate or to share responsibility)? 

• How can the small group discovery method be varied so that each 
group works at its own pace with a set of written materials? 

Further Work 

Since the pilot study, the author has used the small group method lo 
teach courses in calculus, honors calculus, abstract algebra, transformation 
geometry, non-Euclidean geometry, and mathematics for elementary school 
teachers. In addition, some colleagues have used small groups in teaching pre- 
calculus mathematics, linear algebra, advanced caJculus, conipl(r;f variables, 
and algebraic topology. Th^re were no special ?kdmissions requircmei'^is for 
these classes; any siudent who had the prereq/msitcs vs^as admitted 

After the pilot study, the author madcrsev -'al changes in his leachimg 
of small groups. Class sizes were larger, typiicaJly iana;ing from 20 to 28 stui- 
dents. For classes with more than about 28 stiudents, an assistant was needed no 
help supervise ihc groups. 

The teacher frequently introduced new c^^^nicpts and problems in writ- 
ten form, rather than in class discussions. The of dittoed worksheets on* a 
special text allowed each group to set its own pace when working throug./n the 
materials. The teacher did not provide any notes coniVAining j rer^>rd of the 
students' accomplishments. Students t(K)k their own notes if tht'y wished to do 
so. 

Considerable care was taken in forming the work groups. At the begin- 
ning of each semester, students were asked to switch groups frequently in or- 
der to meet many differ^-nt class members. After this initial period of acquain- 
tance, student?5 were asked to write down privately the names of class members 
they preferred lo work with, those they could tolerate if necessary, and those 
they wished to avoid. The teacher then used this written information to form 
compatible groups which remained together throughout the semester. 




The course grade was not always based on the A-B scale. The grade 
was based on take-home exams, attendance, and homework. The teacher 
checked some homework problems, and class members took turns checking 
others, A student could turn in the same homework problem several times 
until it was finally correct. 

Since 1972 the author has oflfered an annual graduate seminar for 
teachers who wished to learn about using small group instruction. Partici- 
pants have been mathematics teachers ai the elementary, secondary, and col- 
lege levels. In the seminar, the theory, practical techniques, resource materials, 
and research literature for small group learning of mathematics have been 
taught. Concurrent with the seminar, each participant has taught a course 
employing the small grou]) method. The seminar has been used as a support 
group for teachers to try out new ideas and to resolve problems in their teach- 
ing with small groups. 

Oiaer individuals have employed the small group discovery method 
with differing student populations and for various reasons. Using this method, 
a model for developing curriculum materials has been evolved by observing the 
workgroups (Davidson, McKeen, & Eisenberg, 1973; Eisenberg, 1970; Mc- 
Keen, 1970) and learning hierarchies have been developed by small groups 
(SeidI, I971;Shriner, 1970). Buchoff ( 1970) has reported the development of 
programmed materials for use with pairs of high school plane geometry stu- 
dents, and Jordy ( 1976) reported the development of small group discovery 
lessons for use in the Secondary School Mathematioi Curriculum Improve- 
ment Study Materials (Fehr, Fey, & Hill, l';72a, 1972b). Poppcndieck 
( 1 97 1 ) and Thoyre ( 1 970) have used small '^roup methods in teacher educa- 
tion courses. Several studies (Brechting & Hirsch, ?977; Davidson & Urion, 
1977; Gallicchio, 1976; Grant, 1975; Mildenbrand, 1975; Kenney, 1974; 
Klingbeil, 1974; Klingbcil & Davidson, in press; Loomer, 1976) have at- 
tempted to (leterniinc the cffecis tA' using the small group discovery method. 

Research on the small group discovery method has been supplemented 
by developing text materials especially designed for use with that method. 
Thus far materials have been developed for courses in elementary algebra 
(Stein & Crabill, 1972), plane geometry (Chakerian, Crabill, & Stein, 
1972), abstract algebra (Davidson & Gulick 1976), mathematics for ele- 
mentary teachers (University of Maryland Mathematics Project, 1978; 
Weissglass, in press) and mathematics for liberal arts majors or elementary 
teachers (Kii3up. Smith, Shoecraft, & Warkentin, 1977). At present, text 
materials arc being prepared for courses in linear algebra (Dancis, unpub- 
lished manuscript) and the calculus of one variable ( Leach & Davidson, un- 
published manuscript). 

A more complete description of the small group discovery method and 
oi the pilot study can be found in Davidson (1971a). Other published papers 
dealing with the small group discovery method are Davidson (1971b), David- 
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son (1974), Davidson (1976), Davidson, Agrccn, and Davis (1978), Mc 
Keen and Davidson (1975), and Wcissglass (1976, 1977), 
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Chapter 4 



Development of a Unit of Number 
Theory for Use in High School, Based 
on a Heuristic Approach 

Shlomo Libeskind 

The Problem and Its Background 

"How does it happen that so many refuse to understand mathemat- 
ics?" Poincare' asked (1929, p. 43; 1969a; 1969b, p. 295) at the beginning of 
the century. This question is at least as relevant today as it was then. 

In spite of recent efforts to develop new curriculums, textbooks, and 
materials, the number of students failing or doing badly in mathematics is 
enormous. It is common to hear students at all levels, high school and college, 
complain that mathematics is a dry uninspiring subject and that it depends 
upon many incomprehensible tricks. 

Proof and deductive reasoning are at the very heart of mathematics, yet 
in textbooks ;<nd classrooms, mathematics is usually presented as a finished 
product. The student is rarely told how one starts a proof or proceeds from one 
step to the next. As a result many find it difficult to reproduce proofs they have 
learned and almost impossible to prove new statements and solve more chal- 
lenging problems. 

Traditionally the student s first encounter with proof is in high school 
geometiy (usually tenth grade). Some newer curriculum programs present 
proof along with algebra, although proofs in beginning high school-algebra 
involve field axioms and are difficult for most students, even when well 
presented. Later in algebra, empha.sis is placed on techniques for solving par- 
ticular types of problems, and even when proofs are presented they are rarely 
emphasized. The problems that most students are able to solve are usually 
routin- in geometry where students encounter proofs and more challenging 
problems, they also experience more di.'Sculty. 

In view of this situation this author wanted to develop a unit on proof 
for use in high school that would be accessible to students with a background 
in beginning algebra. Thus it was decided to develop a unit in number theory 
using a specially designed heuristic approach, based on the teaching and learn- 
int; of problem solving advocated by Polya (1954a, 1954b. 1962, 1965). When 
using this approach in proving a theorem or solving a problem, a teacher does 
tun merely justify each ste[) by referring lo a previously proved theorem, defi- 
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nilion, or axiom hul shows why il is rcasonal^Ic lo sUirt ih(! proof in one w;iy 
and nol anolhcr and how one knows how lo proceed from one slop to ihe nexl. 

The overall objrdives of ihe sludy were lo develop such a unit in 
number dieory and lo icsi ils feasil)ilily by prescnling il lo an ungraded class of 
high sch(K)l siudcnis. Three basic queslions were asked: (a) Can ihc siudenls 
reproduce ihe proofs of ihc iheorems in ihe unil? (b) Can ihe siudenis under- 
stand ihe meaning of ihe iheorems? (c) Can ihe siudenls apply ihc meihods 
used in proving ihc iheorems in ihc unil lo solve new problems whicii include 
proving siaiemenls the siudenls have not seen before:* 

Development of the Unit 

The development and iryout of the number theory unil were based on 
a curriculum development model advocated by Romberg and DeVault 
(1967). According to that model, the steps in developing an instructional sys- 
tem arc analysis (malhemalical and instructional analysis), pilot examina- 
tion, validation, and dcvclopmenl. l*he sludy carried the development of the 
unit only through the first two phases of this model. As Romberg and DcVault 
point out these two phases are of great importance in the development of an 
instructional system ( 1967, p. 107). 

Mathematical and Instructional Analysis 

In order to keep the mathematical prerequisites for the unit minimal, it 
was decided to work within the system of whole numbers. Thus, the .symbol 
fl\a was defined as follows: d\a if there is a whole number k such that a = hi. 
The following main theorems and topics were chosen: 

Theorem 1: If ^71^/ and ^/|/; then 1 (t/ + /;). 

Theorem 2: If 'I\ay d\b and d\c then fl\ {a + /j+ c). 

Theorem If a > />. rl\a and rl\b then rl\{a- h). 

Theorem 4: If fl\a and A is a whole number then fl\ka. 

Theorem 5: If ^/|a and then rl\ {ka + nb) where A: and n are whole 
' numbers. 

Theorem 6: U a\b and b\c then a\c. 

Divisibility by 2, 4, and 5. 

The meaning of 'Mf and only if/' 

Theorem 7: If (I\a, d/b then rl/ia-^-b), 

Thc(»rem 8: There are infinitely many primes. 

rheorem 9: If n has no prime factors less than or equal to the square 

root of riy then is prime. 
Sieve of Eratosthenes. 

Theorem 10: (uy b) = {a - /j, b) where (a, /;) denotes the greatest com- 
mon divisor of n and b, 
Euclidean algorithm. 
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A lask analytic approach developed by Gagne ( 1965) guided the de 
vclopmcnt of the unit. This approach is well described by King ( 1970): 



The idea is to express the objectives of instruction in terms of observable 
performance tasks. If the instruction i.s successful, the students will 
demonstrate the ability to perform the speciGed behavioral objectives. 
Hence, the success of ihe instruction is measured in terms of student 
performance on predetermined performance objectives. Once the curric- 
ulum developer has specified these objectives, a task analysis is per- 
formed. The task analytic procedure is performed. The task analytic 
procedure was developed by Gagne to train human beings to perform 
complex tasks. The basic idea of this approach is to break down each 
behavioral objective into prerequisite subtasks; these subtasks may in 
turn be analyzed into 6ner subtasks. The procedure continues until one 
reaches a set of elemental tasks which cannot or need not be further 
analyzed. If properly done, the task analysis should yield a hierarchy of 
tasks which indicate the steps a student must take in order to learn the 
terminal behavioral objective. The hierarchy indicates how instruction 
would proceed: one starts with the simplest tasks and learns each sub- 
task until the terminal objective has been mastered, (pp. 48-49) 

Any proof or solution to a problem has two basic components: (a) 
knowledge of an ability to manipulate subject matter content, and (b) a plan 
or strategy which permits the student to use the subject matter content to form 
a valid argument. 

The task analysis related to the first component is usually quite simple 
to identify. The proof of Theorem I, for instance, needs the application of the 
definition o{ divides, the substitution principle, and the distributive law. 

The second component, the plan or strategy, is of utmost importance. 
Being able to find a strategy makes the difTcrcnre between finding a proof (or 
solving a problem) or not finding one. Here are the greatest difficulties that 
most students encounter. As already pointed out, some textbooks outline a plan 
or strategy for proving a theorem or .solving a problem, but very often these 
plans arc merely recipes for solutions and do not explain to students why this 
plan was chosen and not another, or why each step within the plan was taken. 
In this way most plans fail to show students how they should go about finding 
a proof or solution on their own. 

The present study put great emphasis on showing how strategies could 
be found. Thus a modification of Polya's (1945) heuristic approach was used. 
A similar approach was used by the present writer in two expository works: 
Libcskind (1968) and Beck, Bleicher, and Crowe (1970, in particular Chap- 
ter 2). The main idea in the approach was to show the student why it is rea- 
sonable to take each step and to point out alternative approaches. 
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One of ihe goals of ihe sludy was lo insirucl sludcnls in ihc use oi ihr 
hcurislic process, lhal is, lo encourage ihem lo ask hcurislic quesiions when 
confronied wilh reproducing proofs of (he iheorcms and solving new problems. 
To achieve this, ii was decided lo encourage aciive siudeni guessing while 
proving ihe iheorems and solving problems. Siudenis were asked lo suggesl 
whai the nexi siep in a pariieular proof should be. To avoid siiuaiions in 
which one siudeni responds lo a queslion and the leach-*r eoniinucs as if ihc 
whole class responded, ihc response rif more lhan half of ihe sludcnls was 
soughl by asking ihe siudenis lo wrile answers in iheir noiebooks Sludcnls 
who did noi gel ihe answer were given furiher hinis. 

To discourage memorizaiion, proofs were wriiien in several differeni 
forms: iwo column, siory lype, a combinaiion of ihese iwo lypes, and diagram 
form. Wriiing proofs in difi'ereni forms is also valuable for oiher reasons. A 
iwo-column proof is helpful for beginners, since il is slruciured in the form 
Slatcmcnl— Reason, and reminds ihe siudeni lo give a corresponding reason lo 
I lie SI airmen I. A slory-iypc proof is universally u.sed in malhemaiics as it is an 
easy way lo wrile and ex[)Iaiii a longer proof. A diagram approach is some- 
limes useful in discovering a proof. 

Ofien ihe same iheorem or problem was proved or solve by several 
differeni meihods. The reasons for ihis are: 

• A siudeni who does not see one approach may find anoiher 
iindersiandable. 

• One imporiani general principle appears lo be ihis: wherever possi- 
ble, ihe child should have some inirinsic criterion for deciding ihe correcincss 
of answers, wiihoui requiring recourse lo auihoriiy. . . 

• In more advanced work in laier grades, solving problems by several 
diflercnl meihods, recognition of patterns, and even the use of simple logic will 
play the role of foundation for deciding correctness without recourse to author- * 
iiy (Cambridge Conference on .School Mathematics, 1963, pp. 15-.35). 

The .second goal of ihv, experiment was to demonstrate that sludcnls 
tnuld master ihc objrcuvcs of the unit. To accomplish that goal, the idea of 
mastery /cnrniri^ was used. This a.ssumcs that given enough time all or almosl 
all students can learn the intended course /natcrial and it is the task of ihe 
iirsirucior to find the means and methods lo obtain this mastery. Mastery 
learning has l.>ecn extensively reviewed in a book edited by Block (1971). 
King (1970) and Shcpler (1969) demonstrated that mastery learning in 
mathematics can be successfully used in elementary schools. 

For this study, the mastery criteria were: (a) a siudeni must respond 
correctly to at least 73% of the items on the test in order to be considered a 
master, and (b) at least 70% of the .students had lo be considered as masters, 
on all tests. 
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In order to achieve mastery learning of the behavioral objectives of this 
instructional unit, the following were used in the studies: 

1. A heuristic approach was employed. 

2. On each test students were rated as masters or nonmasters. If a 
student was a nonmaster on a topic, that student was given further instruction 
and an opportunity to take a parallel test. If graded a master on a second or 
third test he or she was counted as a mas'cr for (he topic. 

3. Seven booklets were developed which the students used to learn and 
review most of the theorems in the unit. On each page of the booklets there was 
expository text and questions. Students were asked to answer the questions 
and compare their responss with the answers on the following page of the 
booklet. Sometimes immediately after the question there appeared the word 
"Hint" with a number after it. In that case the students could use the corre- 
sponding hint on the last page of the booklet, but only after trying unsuccess- 
fully to answer that question. , 

4. Problem sheets were given daily as homework; these were corrected 
and mistakes were pointed out. Solutions and mistakes were discussed in class 
and individually if necessary. 

5. If the students were masters on all the mastery tests they would De 
considered masters of the unit. 



To aid the development of materials appropriate for high school stu- 
dents, a pilot study was conducted in the summer 1970. The pilot consisted of 
25 sessions, including testing sessions, of about 50 minutes each. After each 
session there was a study period of about 40 minutes. Five black students, two 
girls and three boys, from inner-city schools in Michigan were taught by the 
author. These students were participants in the Michigan State University 
Inner City Project (MSUIC-MP). They were average and above average 
students; two had finished ninth grade, one had finished tenth, and two the 
eleventh grade. The students were selected on their teachers' recommendations 
that they were probably of college capability. 

A test of prerequisites wa administered. It showed that most of the 
students, especially post ninth- and tenth-grade students, needed instruction in 
the prerequisi ^s. So. a 1-week unit on the prerequisites was given before 
teaching the 4-week experimental unit itself. 

The results on the problem sheets and the mastery tests in the pilot 
study demonstrated that the unit was appropriate for an average or above 
average ungraded group of high schml students, but the experience gained in 
teaching the unit showed some minor changes were desirable. Some problems 



The Pilot Study 



63 




were adrlcd in ihe probltrm sheds unci a lew e.x(>lanaii(>ns in the booklets were 
tlarificrd. A seventh hfioklci wiih a diflereni f)rool ()r Theorem 10 wasadded. In 
f>roving Theorem 10, two sludenls had suggested eonsidering ihe sel of all 
common divisors o{ a — b and 6, and showing thai ihc sets arc equal. Along 
ihese lines booklet No. 7 was developed. 

Another change was made in ihe order of presentation of Theorem 9 
and the Sieve of Eratosthenes. In the pilot study the Sieve method was 
presented first, in the hope that the eonteni of Theorem 9 would be discovered 
from the Sieve method. However, students had difficulty discovering Theorem 
9 even after hints were given. It was decided to try another method in which 
Theorem 9 was done first. The second method worked much better, so it was 
used in the main study. 

The Main Study 

As pointed out, this study wafs based on a curriculum dcvelo[)ment 
model developed by Romberg and DeVault ( 1967). In the main study, con- 
ducted during the summer of 1970, 10 average and above average students, 
seven girls and three boys, were taught the experimental unit by the author. As 
in the pilot study, the students were participants in the Michigan State Uni- 
versity Inner City Mathematics Project (MSUIC-MP) on the campus of 
Michigan State University. Nine were black, and one was white. Three stu- 
dents had finished the ninth grade, three the tenth, and four the eleventh grade. 
The students were selected for the summer institute on their teachers' recom- 
mendations that they were probably of college capability. 

I'he main study consisted of 2!> sessions of about 50 minutes each. Af- 
ter e;K h session there was a study pcrriod of 30 minutes. On the first day stu- 
dents weregiv(rn a test on prcre(juisites and a ()reiest. As in the pilot study, the 
test results showed that it was necessary to.spend the first week teaching pre- 
requisites. Two experienced high school teachers were present at each session. 
They observed and wrote a protocol of all activities and their notes were used 
in writing journals for the lessons (Libeskind, 1971, pp. 21-224). 

Conclusions 

Nine students took the posttest in the main study; each of these stu- 
dents were masters on the posttest as well as on the four mastery lesson tests. 
*i'hc test results were used to answer the three basic questions posed earlier: 

I . (Ian the students reproduce the [)r()ofs of the theorems? The results 
of the njastery tests and jxisttesi showed that the students were able to 
reproduce the proofs. 
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2. Can the students understand the meaning of the theorems? The 
results of the tests indicated that the students understood the meaning of the 
theorems. The students were able to give numerical examples of the theorems 
and apply them to divisibility facts. They were able to use some of the theo- 
rems to use certain algorithms (finding if a number is prime, Sieve of Eratos- 
thenes, or the Euclidean algorithm). 

3. Can the students apply the methods used in proving the theorems in 
the unit to solve new problems which include proving .statements the students 
have not s^^n before? The results showed that the answer to this question is in 
the affirm live as well. 

In regard to Question 1, the data shows that the students were able to 
reproduce the proofs even though they were not asked lo memorize them. In 
fact, the students were explicitly discouraged from memorizing the proofs. 
Thus, the ability of the students to reproduce the proofs may owe much to the 
method of instruction and the use of the heuristic approach. 

The affirmative answer given tc Question 3 is of particular signifi- 
cance. The ability to solve new problems and prove new statements was con- 
sidered a transfer measure of the understanding of the proofs in the unit and a 
measure of the success of the heuristic method of instruction. A part of the 
preiest-positest was primarily concerned v/ith this type of new problems. 

The ability of the students to recognize if a proof is valid was another 
indication that the students understood the proofs in the unit and did not just 
memorize the steps and their reasons. These results are especially encouraging 
since proofs of theorems such as Theorem 9 and 10, the Euclidean algorithm 
and some of the problems in the problem sheets, and mastery tests are difficult 
even for college students. 

The seven booklets played an important role in learning the theorems 
in the unit. Usually after completing a booklet, the students were confident 
that they knew the proof of the theorems in that booklet. 

The students enjoyed the unit. Most were active in classroona discus- 
sions. They particularly enjoyed the application of Theorem 9 to the search for 
prime numbers, the Sieve of Eratosthenes, and the application of Theorem 10 
to the Euclidean algorithm. 

• The students reacted positively to the idea of mastery learning; many 
remarked that they would like this procedure used in their schools. All the 
students were eager to become masters the first time they took a test. The 
mastery learning procedure worked well in a small dass situation, where the 
teacher could see the progress of each student and give individual help when 
necessary. 
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Recommendations for Further Study 

i'hc heurisiic approach used in ihc unil seems very promising, nl- 
ihaugh ihe sludy carried ihe developmenl of ihe unil only inlo ihc Pilol Exain- 
inaiion phase of ihe developmenial model of Romberg and DeVauli ( 1967). 
However, al ihe beginning a sirong Hawlhorne elFeei was evidenl. This efTeci 
seemed lo be due lo ihe.siudenis' new universily environmenl and awareness 
lhai ihey were in a special projeci, raiher ihan lo ihe experimenial nalurc of 
ifie uiiiL 

rims, ihc riii(hiiL;s tiiiisl \)v subjected u» liirllici cxiiiniii.ilion. Tlic re- 
sults suggesl ific following recommendaiions for furiher sludy: 

1. The developmenl of ihe maierials in ihe unil should be conlinued lo 
delerminc whether the unit will be eirectivc with other gro.ups of average and 
alxjvc average students, and wfietlier oihcr teachers will be able lo teach il 
using the lieurislic approacfi. The nrcessiiy for this is a | illy |)oinleil i)U[ by 
Romberg and DeVault ( 1967): 

Assuming thai a procedure has proven to be feasible in its piloi-iryoul, 
the next phase is validation. The materials and methods need lo be tried 
out in a variety of regular classrooms with other kinds of learners, other 
kinds of teachers and in different social contexts, (p. 108) 

2. The success of this study suggests that it might be feasible to design 
and teach other material this way. Since the students particularly enjoyed the 
applications of Theorems 9 and 10, .the extended unil should include congru- 
ences, divisibility tests by 3, 9, and 1 I, Ferniat's theorem, and some number 
iheoretic functions. 

3. It would be valuable to Gnd out if the heuristic approach used in the 
Sludy could also be used in the elementary school. King ( 1970) designed a unil 
on proof for sixth grade and showed that the students were able to apply the 
theorems in his unit to simple divisibility facts and to reproduce the proofs of 
these theorems. The students in King*s study were drilled on the proofs of the 
theorems. They were able to reproduce the proofs, but they were only asked to 
prove three statements ihey had not seen before. It would be interesting to find 
out whether the heuristic approach u.sed in the present unit would enable 
sixth-grade students to reproduce the theorems with less drill, and result in 
transfer and successful problem solving on a higher level. 

4. The effectiveness of the heuristic approach suggests ihat developing 
materials using such an approach for high school and college classes could be 
worthwhile. 




Chapter 5 



An Exploratory Study on the 
Diagnostic Teaching of Heuristic 
Problem-solving Strategies in Calculus 

John F. Lucas 



In a 1976 paper on basic mathematical skills prepared for the National 
Institute of Education by the National Council of Supervisors of Mathematics 
(1976), the following statement is especially noteworthy: ''The main reason 
for studyini^ mathematics is to learn to solve problems." 

The stuJy outlined in this chapter was conducted 6 years earlier ( Lu- 
cas, 1972), l>ul it was motivated by precisely the same assum[)tion. If the as- 
sumption is true, rcs(Sirchers in mathematics educaiion and mathematics 
IcacluTS oii^hi to [)e searching for ways to im[>rovc learning, teaching, and 
communicating mathematical problem solving. This can be accomplished 
most effectively through collaborative efforts of researchers and teachers. The 
major difficulty is where to start looking. 

In conceiving this study, the writer focused on what he saw to be the 
psychological core of mathematical problem-solving, i.e., heuristics. Heui istics 
are higher-order, tentative, general decision processes which help organize 
and narrow the search for a problem solution. Drawing diagrams, separating 
information, rea.soning backwards, recognizing and using analogies, searching 
for patterns, successive approximation, checking, and exploiting problem sym- 
metry are some examples of heuristic behavior. These actions are tentative in 
that they do not guarantee success; they are general in that they apply across 
specific problems and classes. In contrast, processes such as applying the quad- 
ratic formula to solve certain equations or Gauss-Jordan elimination to reduce 
a matrix are algorithmic, since they are always successful when correctly ap- 
plied and since they apply only to specific kinds of problems. Heuristics, on tlie 
other hand, iranscrrnd classes of [problems and are a unifying element in the 
study of problcni-.solving. Inlbrmation about teci<*hif)iij and learning heuristics 
is criiical for understanding the entire pro(X^s o. solving mathematical 
[)robiems and its r<!lati()nship to teaching and leai Mln^ mathematics. This as- 
sumption i)layed a significant role in the development of this study. 

At the time of the study reported here, few research studies had been 
concerned specifically with heuristics. The mathematician George Polya 
( 1948, 1954a, 19.54b, 1962, 1965) provided a great quantity of information on 
heuristics in many interesting mathematical problems and discussions. Polya 
has furnished mathematics educators with material to be tested in many dis- 
sertation siudies, including this one. Essentially he condensed his own exper- 
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iences and those of other writers, distilled the key ingredients of mathemati- 
cally oriented mental processes in problem-solving, and arrived at an array of 
heuristic processes. Polya's writings demonstrate the utility and effectiveness 
of heuristics in the hands of skilled problem solvers. His ob.servations led to an 
inquiry-oriented model for teaching mathematics, where the teacher asks heu- 
ristic questions and makes suggestions, and the learner develops self-direction 
by asking the same questions while attempting to solve problems. 

Although the heuristics identified by Polya are important objects for 
research, it is difficult to gather evidence about them since they must first be- 
come observable actions, and there must be a system for recording their occur- 
rence and making measurements. Traditional paper-and-pencil tests are 
clearly inadequate for observing the problem-solving process or measuring its 
content. However, the problem of direct observation is largely overcome by 
requiring the problem solver to think aloud as he or she works. This technique 
was used by Duncker (1945), by information-processing theorists (see 
Newell, Shaw, & Simon, 1958), and more recently by Soviet investigators 
(see Krutetskii, 1969). 

In the late sixties, Jeremy Kilpatrick (1968) developed a system for 
coding problem-solving actions and events observed from tape-recorded think- 
ing-aloud protocols. In Kilpairick's system, symbols representing behaviors 
and events were recorded in the same sequence as those events actually had 
occurred. Kilpatrick called this a process-sequence code. Using this system, an 
observer could record a time-exposure snapshot of problem-solving actions. In 
the study outlined here, Kilpatrick*s system was considerably modified with 
respect to content, but its structure nevertheless provided the basic idea for 
gathering data. 

Lacking a firm theoretical base on which to build, the study reported 
here was planned as an exploratory study. It was partly influenced by the 
**teaching experiment" style of Soviet problem-solving research. Research on 
problem-solving processes in mathematics was in its infancy, with a great deal 
of exploratory work, observation, treatment variation, data probing, and con- 
jecture needed before rigorous experimentation could be executed ( Kilpatrick, 
1970). Thus in 1970, exploratory work aimed at the mundane task of develop- 
ing better behavioral analysis instruments would be niore beneficial than am- 
bitious attempts to define effective teaching or learning of problem-solving. 
Thus the major objective of this study was simply to develop conjectures to 
help sel a course for further investigation. 

'I'hc study look place in a first srniester calculus course (differential 
and integral calculus of real-valued functions of one variable). This setting 
was choscti l)e(Miisc student subjects were accessible to the investigator, and 
because the calculus offers a challenge in terms of integrating problem-solving 
techniques with standard content. The plan was to conduct a clinical diagnos- 
tic teaching experiment in which as many of Polya's heuristics as feasible 
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would be intnxluml u? frt-shman cah ulus smdcnl.s. c pradici: (ircalmenl) 
pniblcms wt-rc ( al( ulus ()n)blrrrts; ihc f:>p<rrinicrual (icsi ) problems were gen- 
eral ri':)nroulinc rnalbernaii( al problems. Observation was conducted in audio 
lapc-rccorded individual s»"s.sions in v/hith subjects thought aiwud while solv- 
ing problems. Protocols were analyzed through a system modified from Kilpa- 
trick's a' id developed specially for ihis study. The data were probed; outcomes 
were conjectures pertaining to the following rcsearrhal)U' questions: 

1. I:; it possible to teach heuristi(\- and produce observable effects? If 
so, what are the nature of I he effects: 

a. Are there strategy shifts? Is there an increased frequency or a 
change in emphasis on heuristics? 

I). Are there changes in problem-solving performance? Are there cf- 
leds on lime, accuracy, eoriipieieness, difficulty, or errors? 

2. (!an Kilpatrick's system ol behavior analysis be adapted for obser- 
vation of (ollrge siiidents? 

3. Is 1^ possible to dcvi.sr reliable modification of Kil pat rick's system? 

4. Clan heuristics be integrated into college calculus without sacrificing 
course content ? 

The author believed that answers to these questions would help guide re- 
searchers, generate ideas for classroom teachers, and improve communication 
t:f mathematical problem solving. J; is the purpose of this [)aper to describe a 
heuristic teaching experiment eonduded b/ the author in 1970, a method for 
analyzing prnbl'Mi-solving processes, a summary of tentative conclusions, and 
further work in research and curriculum development undertaken by the au- 
thor as a (onsecpieiK of this study, li is riot the purpose of this paper to em- 
phasize inicicr-t es ih.it joight l)e di*awn from the datn. The reader interested in 
gre.jier deiail is (iijctted lo the corres[)()n(ling dissertation or journal report 
(Lucas, 1972. 1974). 



The Study: Structure and Design 

I'he main study was preceded by a pilot study in spring 1970 with six 
volunteer students who were taking a second semester caleulur course from the 
author. These siuden's were given a set of 15 mathematical pre«blems, mostly 
rate and optimization problems in differential calculus. They were asked to 
solve the problems while ihinking aloud. .Several interview sessions were re- 
quired in all but one ease, and the total recorded time averaged 4.25 hours per 
subject. The pur{X)ses of the pilot study were to familiarize the investigator 
with the interview procedure, provide pnjcess- sequence samples for coding 
practice and revision, and determine which heuristics were most likely to be 
observed in student problem-solvini; at tliat level. Asa consequence ol the pilot 
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Table 1 
Design of the Study 



Group N I (Pretes t) II (Instruction) III (Posttest) 

~r I fib ~Q 

C: • 6 0 no-H 0 

Ho 9 no-0 H 0 

Cg 7 no-0 no-H 0 



^ s diagnostic observation. 
^H ^ instruction on hajristlcs. 



study, ihc coding system and interview formal were revised several limes in 
pu "aiion for the main study. 

In fall 1970, 30 university students from two first-semester calc: 'us 
classes taught by the investigator participated as unpaid volunteers in the 
study. The study was executed in three phases during the 14-week term. 
Phases I and HI w<*re 2-hour diagnostic observation interview sessions (test- 
ing) with individual subjects. This series of interviews (pre- and posttests) 
lasted about 2 weeks each. Phase II was an 8-week instructional treatment. A 
Solomon four-group design (Campbell & Stanley, 1969) involving two treat- 
ment conditions (explicit heuristic instruction X no explicit heuristic instruc- 
tion ) and two testing conditions ( pre- and posttests X posttest only) was used. 
Table 1 illustrates this design. 

One class was taught using explicit emphasis on heuristic techniques, 
and the other served as control with no explicit reference to heun.^lics. Subjects 
were not randomly assigned to treatments, but appeared in a treatment group 
by registering for that section of the course with no foreknowledge of an exper- 
iment. The two groups were taught one after the other each morning. The 
total number of subjects (30) was small because of the individualized nature 
of the interviewing sessions and the amount of time required for analysis and 
coding of observations from 44 2-hour sessions. Background information, in- 
cluding age, sex, class, intended major, semesters of high school mathematics, 
grade point average, and ACT mathematics percentile rank, was obtained for 
each subject. On these particular traits, all four groups were very similar. 

During Phase I, 14 subjects participated in problem-solving interview 
sessi':ns. Each subject was given a booklet containing instructions, two sample 
problems, and sev^n test problems. These prob: ivr vere general (noncalcu- 
1 js, except for the last two), and each had se\c^c. ^tential solutions. Two 
examples are given. 
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Prnhlem J (Prvtrst/ The speed of sound in an iron rod is 16,850 fi/scc, 
and lhcsf>n-dol sound in air is MOOCi/scc. If a sound originaiing ai one 
end of I he nnl is heard I scrond sooner ihrougij I he rod lhan througli the 
air, how long is the rod.^ 

Problem 2 (Pretest) A real estate agency offers you a choice of two 
triangular pieces of land. One piece has dimensions 25, 30, and 40 feet; 
the other has dimensions 75, 90, and 120 feet. The price of ihc larger 
piece is 5 times the price of the small piece, Which is the better buy? 

The subject was told to solve the problems using pencil and paper, but 
to think aloud while working. These remarks were tape-recorded, and the 
interviewer noted various observations. Interaction between intervi'iwer and 
subject was minimal, except for an occasional reminder to think aloud when- 
ever the subject lapsed into silence. Retrospective comments by the subject or 
feedback from the interviewer about correctness and quality of the soluticr.s 
were avoided. For each interview, the record of thinking aloud, the subject's 
written work, and ihe interviewer's notes formed a collage of information rep- 
resenting the problem-solving process. This inforr.iation was studied and re- 
duced to n checklist, process-sequence code, and score, which are described 
ne.xi. 

Behavioral Analysis: A Coding System 

Integrating the structure of Kilpatrick's behavioral analysis (1968), a 
model of heuristics for solving mathematical problems, and the experience and 
observations of the pilot study, the investigator created an extension of Kilpa- 
trick*s system which included tho.se of Polya's heuristics observed during the 
pilot study. This new system consisted of a checklist (see I'^igure 1 ), a process- 
sequence code (sec Table 2), and provisions for scoring various aspects of 
lime consumption and general performance. The checklist categories included 
several heuristics not represented in the process-sequence code, but the major 
function of the checklist was to provide a more detailed analysis of some 
heuristics and events which were assigned process-.sequence codes. In addition, 
three measure of time and four measures of score were taken for cacli 
problem. 

Of i>.iriicular interest in the study were heuristic strategy shifts, 
changes in nature or frequency of heuristics, and changes in problem-solving 
{MTforrnance. To delect these changes, a system of behavioral analysis was 
dcsigficci to record and evaluate many actions which could ocenr during a 
problem soliiiion. The kind of notation used, the number of diagrams drawn, 
whether or not the diagrams accurately represented problem conditions, the 
number and kinds of diagram modifications, and whether or nut the subject 
rccaillixi a related problem or applied its methcd or result were examples of 
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CODIKG FORM (FINAL VERSION) 



Subject No. 
Problem .^^o. 

Coder 

Date 



Tape No. 

Tape Readings , 



Approach 

restates problem in own 
words 

mnemonic notation 
representative diagram yes 
no 

auxiliary line (s) 
enlarges focal points 
Production 

recalls related problem 
uses method o\ related 
problem 

used result of related 
problem 

inductive reasoning 
(pattern search) 

Looking Bacl< 

routine check of 
manipulations 
is result reasonable? 
all information used? 

Checliing 

test for symmetry 
test of dimensions 
specialization (extreme 
cases) 

comparison with gen. 
known result 



Time: exc. looking back 
looking back 
total 

Score: approach 

plan 

re-s^ult 

total 



Vg condenses /outlines process 

iries to derive differently 

variation by analogy 
variation by changing 
conditions 

Executive errors (tally) 
algebraic manipulation 

numerical computation 

differentiation 
other 

Interviewer Comments 



PROCESS SEQUENCE 

Figure 1. Checklist categories coding form. 



some of these activities. The frequency of checking wa.s measured by a process- 
sequence code; the kind of checking was classified by seven checklist 
categories. Similarly, two process codes were used to distinguish and sort er- 
rors of structure and execution; the checklist further distinguished four catego- 
ries of executive rrrors. Innvtances in which errors were noticed and corrected 
were also counted. Straifgies by which a solution is produced (e.g., analysis, 
synthesis, trial and error, reasoning by analogy) had corresponding process- 
sequence codes. The by-products of the solution (e.g., equations, relations, 
and algorithmic processes) were also recorded. Another process code was used 
if the subject was observed separating or summarizing problem data. The sys- 
tem also had codes for looking-back behaviors such as condensing or outlining 
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Table 2 
Process-sequence Codes 



Code symbol Observed behavior 



R Reads Problem 

S Separates/summarizes data (wanted vs. given; relevant vs. 

irrelevant) 

Mf Model introduced via ffgure, diagram, schematic 
Mf* Modification of existing figure (auxiliary lines; enlargement; darken- 
ing, etc.) 

Model introduced via f?9 *i ^ with coordinate system 

c 

DS Deduction by synthesis (working forward) 
DA Deduction by analysis (working backward) 
T TrialBn6 error (successive approximation) 

An Reasons by analogy (using methods, results, ideas from problems 

similar in structure) 
Me Model introduced via equation {s) or other algebratic relationship 
Alg Algorithmic process 

N Nonclassilfable behaylor (mumbling, incomplete statements, ran- 

dom guessing, etc.) 
C Checks result 

V3 Varies \he process (attempts alternate attack) 

Varies XUe problem (invents new related problem) 

X Structural error (misinterpretation; misrepresentation) 

i Executive error (manipulative error; miscalculation) 

• £rrc7rexplicitly corrected 

Hesitation of two units (30 seconds) 

/ Stops without solution 



a solution process, trying a differcni mode of attack, or inventing a new prob- 
lem related to the given one. Still other code symbols indicated difficulty, 
namely, hesitating, rereading the problem, and stopping short of a solution. 
When the composite picture was reconstructed from a tape-recorded vocalized 
protmrol was examined very carefully, little observable l)ehavior was likely to 
escape scrutiny. There were 25 ch^rklist calegorics in the original .sysl m and 
20 process-sequence codes. The latter are presented in Tabic 2. 

In applying the codes listed in Table 2, parentheses were used to clus- 
ter subproccsst's related to a more general process, commas separated 
processes, and a period denoted a completed solution. Outcomes of processes 
were indicated by numerals I th rough 5, respectively, for abandons process, 
impasse, incorrect final result, correct final result, and intermediate result. To 
illustrate its appearance, the coding string 

I 

R, Mf, -, DS (Me,Alg)5. DA (Alg)5, C, DS (Me,Alg)4,C:. 




would be translated as follows: The subject read the problem (R), drew a 
figure (Mj ), hesitated at least 30 seconds (-), started putting information to- 
gether (DS) to yield an equation (Me) which was solved by a standard tech- 
nique (Alg) to obtain an intermediate result (5). Next, the subject looked at 
the goal and asked what was needed to obtain that (DA), followed by a brief 
calculation (Alg) in which a mechanical error (♦) was made. Upon checking 
back (C), the error was discovered and corrected (*)> and the subject pro- 
ceeded in a forward manner (DS) to derive another equation (Me) which 
was solved (Aig) to produce a correct final solution (4). This solution was 
verified by checking (C) against the conditions of the problem. 

Using the system demonstrated above, in combination with a checklist 
for further clarification, the investigator was able to record the evolution of the 
problem solution so that a much clearer picture couid be obtained than that 
afforded by written work alone. 

Evaluation of problem-solving performance included measures of 
time, score, difficulty, and errors. Time was measured in unii intervals of 15 
seconds each, and three time measures were taken: time excluding looking 
back (a performance measure), time looking back (a heuristic measure), and 
total time (sum of the two). The solution score was split into four weighted 
parts, corrcsi)cnding to approach (subject demonstrated understanding of 
problem, 1 point), plan (subject derives information sufficient to solve prob- 
lem correctly in the absence of executive errors, 2 points), result (subject es- 
tablishes correct and complete result, 2 points), and total score (sum of ap- 
proach, plan, and result scores, 0-5 points). Difficulty was inferred from the 
frequency of hesitation, rereading of the problem, impasses, and stopping 
without the solution. Finally, errors were subdivided into two types, structural 
and executive, and the latter were further subdivided in the checklist. These 
measures helped produce a composite picture of the problem solver*s general 
perform ance. 

The system of behavioral analysis described above was applied to the 
observational data from all interview sessions of Phases I and III. having 14 
and 30 sessions, respectively. This analysis required approximately 400 hours 
of work and was carried out in the semester following the completion of the 
study. 

The Instructional Program 

Phase II, the instructional program and the heart of the study, spanned 
a period of 8 weeks, or f(>rty 50-minutc class periods from September 30, 1 970 
to November 25, 1970. During this phase, calculus topics in both classes in- 
cluded limits, continuity, the derivative, differentiation, applied problems in- 
volving derivatives, and the antidcrivativc. These concepts and related mathe- 
matical informalion were introfluccd in the same expository manner in both 
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classes. The dilFcrences hclween modes of classroom leacliing centered on the 
nature of problem assignmenfs, the depth of discussion of problem solutions, 
the grading of problem s.slutions, and the explieiincss of reference to 
heuristics. 

All problem assignments for the control class were made from the text- 
book (Leithold, 1968). Since answers to problems were available tosiudents, 
ihe control class assignments were not gnuied. However, a part of each class 
period was .set aside for ansv/ering siudentrj' questions about problems. 

In contrast, the experimental class was assigned drill exercises from 
the textbook and supplementary sets of calculus problems not included in the 
text. These supplementary sets, averaging five problems each, were prepared 
and sequenced in advance of the course io hig'ilight and reinforce rcrtain 
heuristics and corre,S[X)nd to classroo!]i topics. Homework problems from the 
suoplementary assignments were graded .several times each week, and the in- 
structor's written comm,ents included heuristic suggestions. The grading sys- 
tem it.self was desit^ned to reward use of heuristics. For example, outlining key 
points of a solution, producing alternate .solutions, or posing related problems 
received extra credit. 

During the instructional period, the control class was assigned about 
20% more [)roblcms than the experimental class but problem solutions were 
discussed differently in each cia.ss. The instructor responded to students' ques- 
tions in the control class, whereas in the experimental class he guided their 
questions by raising issues and making suggestions intended to draw attention 
to heuristics. 

Explicit introduction of heuristics was avoided in the control class and 
emphasized in the experimental class. A set of 12 ''Heuristic Papers" was 
prepared by the author in advance of the study. These were distributed to the 
experimental class as additional reading at intervals throughout Phase II. 
Each heuristic paper made one or more heuristic techniques explicit through 
carefully constructed applications to both calculus and noncalculus problems. 
Historical comments and discussion of the value and effectiveness of heuristic 
techniques in mathematical reasoning were emphasized in each paper. A list 
of the titles of these papers appears in Table 3. 

There was also a ditference in philo.sophy of instruction between 
classes. The nnthor believed that more effective teaching of heuristics would 
cxriir if a few problems wereanalyzed thoroughly than if many problems were 
(list u^<c(l su[)crtlcially. This point had been emphasized by Larsen (1%0). 
I'roblcni (lisi ussrons in each class relle( ted this difference. 

Another philosophical position that separated experimental from con- 
trol instrr.ciitMi was the emphasis on asking the questions 'Vhy?'* and "what 
if. . . Buck ( \ stres.sed the im{>ortance of this attitude in teaching 
mathematics. As a consequence, students in the experimental class were en- 



Table 3 
Heuristic Papers 



A The Nature of Heuristics 

B Pofya's Question List 

1 Analysis-Synthesis 

2 Method-rKiSult Heuristic 

3 Looking Back 

4 Drawing Diagrams 

5 Understanding the Problem 

6 Checking 

7 Reasoning by Analogy 

8 Setting Up Equations 

9 Induction 

10 Sketcli of a Solution Process: A Summary 



couraged to be active participants in the problem-solving process rather than 
passive spectators, to explore and speculate rather than formalize, and to wnrk 
in teams as well as independently. Voluntary 2-hour problem sessions each 
week, led by the investigator, were available to both the control and experi- 
mental class members. 

For objectivity and diagnosis of teaching, a daily log was kept on each 
class. This log included notes on topics, speciBc examples, problems, heuristic 
suggestions, and questions posed by the teacher and by the students. When the 
student-teacher interaction in each setting was analyzed and compared, diflfer- 
ences which were planned to be sharp became blurred; they were more a mat- 
ter of degree. Problems were discussed in both groups, but the discussion pro- 
vided inslruaion on heuristics in the experimental setting, while it reinforced 
concepts in the control class. Questions were asked of both groups, but they 
embodied general heuristic suggestions in the experimental class and were di- 
rected to specific points. in the control class. Students were active participants 
in both groups, but e?r.peri mental students were encouraged to explore, conjec- 
ture, and guess, while control students had to formulate their own questions 
without being asked thought-provoking questions. The teacher guided activi- 
ties in both groups, but laterally in the experimental class and centrally in the 
control class. These were the distinctions between an instructional process 
which emphasized heuristics and one which did not. 

Immediately after the instructional phase, a series of 2-hour interview 
sessions ( Phase III diagnostic observations) were administered to 30 subjects, 
the 14 who had participated in Phase I and 16 additional volunteers from the 
complementary sets of students in the two classes. The two new groups exhib- 
ited similar measures on the traits used to compare the original Phase I 
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groups. The Phase III format was identical to Phase I except thai the seven 
test problems wer<f cli.Tcrent, though several were structurally similar. Again, 
test problems were noncalculus oriented, except for two. Two examples taken 
from the posttest are given: 

Problem J (Posttest) . A man fires at a target, and 2 seconds after he fires, 
he hears the sound of the bullet striking the target. If the bullet travels at 
a speed of 1,100 ft/sec, how far away from the man is the target? 

Problem 3 (Posttest). A circle whose center is at the point (4,6) is tan- 
gent to the line 3x = y - 4. What is its area? 

The.p<*'*»est phase took 2 weeks. During the spring of 1971, all obser- 
vational data ^in recorded interview ses.sions in both Phases l and III were 
assembled and analyzed by the system described earlier in this chapter. Before 
data analysis, modifications in the coding system had to be made as a conse- 
quence of testing for intercoder reliability, which is described next. 

Reliability of the Coding System 

The system of behavioral analysis which emerged from the pilot study 
had 52 difi*crciu classifica. ions — 20 process-sequence codes, 25 checklist cate- 
gories, and 7 lime/scorc performance measures. Nineteen of these classiii ca- 
tions represented actions or events which were clearly defined and easy to iden- 
tify or evaluate. For examples, time measurements, reading the problem, 
drawing a diagram, and producing an equation are actions on which several 
coders would generally agree. On other classifications such as difi*erences in 
modes of checking, the nature of errors, whether a subject was reasoning for- 
ward or backward, and assignment of a performance score, one might not 
expect close agreement by difi*erent judges. 

To test the general reliability of the system, a faculty colleague in the 
Mathematics Department at the University of Wisconsin — Oshkosh was 
trained as an alternate coder. This training was focused on 33 potentially am- 
biguous classifications, under the assumption that if reasonable agreement be- 
tween coders could be attained on the latter, then including the 19 clearly 
defined classifications would not reduce the degree of agreement. The poten- 
tially ambiguous classifications included all 25 checklist categories, the pro- 
cess-sequence codes DS (working forward), DA (working backward), 
T (trial and error), and i (structural error), and four performance score 
measures of aj)[)roach score, plan score, result score, and total score. 

After a I -week training period. 38 ^x)Sttest protocols were coded by the 
alternate coder over a period of 5 days. These were compared with the same 
protocols coded by the investigator. The following three tests were applied to 
decide which classifications should survive for inclusion in the final system: (a) 
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frequency of observation; (b) coder bias (ircjiimenl effect estimates of coder 
differences and score); and (c) intercoder agrcerncm.. 

Behaviors such as inductive reasoning, test for symmetry, and in- 
venting new problems were dropped due to low frequency of observation. The 
relative absence of these behaviors in the protocols of college students made the 
investigator curious. Are certain heuristics problem-specific or class-specific? 
Perhaps the structure of the problem or the nature of the question evokes pat- 
tern-search (induction). Also, are looking-back actions (e.g., inventing new 
problems and trying alternate approaches) a function of student habit or does 
their absence relate to a situation variable like the nature or length of an inter- 
view session? 

Certain classifications in the checklist, such as checking and executive 
error categories, were intended to refine components of the coding scheme 
(C, i ). It was thought that clustering these categories together to produce a 
more macroscopic system would yield increased intercoder reliability. Para- 
doxically, this was not the case. Clu.stering four checking classifications having 
indices of agreement .81, .69, 1.00, and .69, respectively, produced a single 
checking variable with index .75. Similarly, clustering executive error catego- 
ries with indices .84, .96, .85, and .79 led to a single variable for executive 
errors'whase index was .83. These results indicated that a well-trained coder 
familiar with the mathematical setting of the problems can discriminate ade- 
quately among certain closely related behaviors. 

To lest intercoder agreement on process-sequence behaviorr., '.he cod- 
ers* strings of symbols were used along with coder notes on interpretation and 
rationale to obtain frequencies of agreement and disagreement. An agreement 
was tallied when the same action (as described in the coders' notes) was sym- 
bolized identically by both coders. A mismatch or disagreement was scored if 
an action was symbolized differently or missed by one coder. Using this system 
the frequency pairs (frequency of agreement, frequency of disagreement) for 
the four process-sequence behaviors were: 

DS (working forward) ( 146, 29) 

DA (working backward) (33, 16) 

T (trial and error) (41, 4) 

^ (structural error) (37, 5) 

To determine intercoder reliability on the four performance score mea- 
sures, the frequencies of agreement or disagreement corresponded to the 
number of observatit,ns in which the numerical point scores of both coders in 
each category were equal or unequal. The respective coefficients for approach 
score, plan score, result score, and totai score were .94, .93, .97, and .96. Thus, 
the two coders agreed consistently on all aspects of score performance. 
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Table 4 
Classification System 



Uses mnemonic notation 
Representative diagram-yes 
Representative diagram-no 
Recalls related problem 
Uses method of related problem 
Uses result of related problem 
Routine check of manipulations 
Checks if result is reasonable 
Checks if all information used 
Checks for appropriate dimensions 
Makes algebraic manipulation error 
Makes numerical computation error 
Makes differentiation error 
Other errors 

Time: excluding looking back 
Time: looking back 
Time: total 
Score: approach 
Score: plan 



Score: result 

Score: total 

Reads problem (R) 

Separates/summarizes data (S) 

Draws diagram (Mf) 

Diagram with Coordinate System (Mf ) 

Synthetic Deduction (DS) ^ 

Analytic Deduction (DA) 

Trial and Error (T) 

Reasons by Analogy (An) 

Produces Equation/Relation (Me) 

Algorithmic process (Alg) 

Nonclassifiable (N) 

Checks solution (C) 

Makes structural error (T) 

Makes executive error ( i ) 

Notices/corrects error (*) 

Hesitates; two units {-) 

Stops without solution {/) 



After making adjustments to the co(iing scl^eme, 3b classifications 
emerged to form ihe system used in analyxing problem-solving protocols. 
These are listed in Table 4. 

Analysis of Data 

Each behavior/event appearing in Table 4, vvith the exception of the 
three time scores, was convened to a dichotomous variable for analysis. De- 
pending on symmetry or asymmetry of the distribution, the criterion for di- 
chotomization was set either at the integer nearest the median of a variable's 
posttest frequency distribution or on the basis of simple presence or a[)sence of 
the behavior. Frequency scores were obtained by summing observations of the 
behavior for each subject across the seven posttest problems. Response level 0 
meant absence of the behavior or frequency below the median; response level 1 
meant preseiue of the behavior or frequency above the median. 

Two types of X*"* analyses were used to treat dichotomous data. First, 
fjosttcst data were entered in tvv<»-\vay contingency tables for standard 
analysis of main eflfecis due to treatment. The data were explored further by 
(?nierin;4 pretest and jx)sltcst information on 14 subjects into three-way contin- 
genvy tables and applying a method of logit analysis suggested by Goodman 
( 1969, l*)7(). 1971 ). This analysis was used to explore jx)tential main effects 
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and interactions of pretesting condition or pretest resporse-Ieve) with treat- 
ment condition and postiest response-level. To illustrate the naure of these 
interactions, s.imple threc-y/ay contingency tables are given in T ables 5 and 6. 

Note that three-way interactions in a three-way t ible are equivalent to 
two-way interactions in a 2 X 2 tactorial arranger.en., with the dependeni 
measure being a dichotomous posttest response level. Analogously, two-way 
interactions in a three-way table are equivalent to main effects in the factorial 
perspective. 

Nondichotcmous data (e.g., time scores) were treated by analysis of 
variance and covariance using standard F-tests (Winer, 1962). 

Although the statistical analysis of data described above lends an ap- 
pearance of hypothesis-testing, this was an explor...;)ry clinical study. Various 
hypotheses wc\X devised, but the statistical tests were applied primarily to 
[>rovidc insight into which behaviors might be worth pursuing in further work. 
Six hypotheses formed the framework of the investigation at this point. Five of 
these fiypothi'ses were cont:crnetl with pretest X treatment interaction, main 
effect due to pretesting, pretest response level, main effect due to treatment; the 
f>osttest scon .-^'cre used as dependent measures for each of these hypotheses. 
The sixth hypothesis concerned the genera! effect of treatment on combined 
posttest scores. Where interactions were shown to exist, main effects were not 
explored further. However, if the data reflected an apparent difference be- 
tween experimental arid control groups, and if liiere were no apparent interac- 
tive effects of pretesting or pretes. , jsponse level with treatment, or main effect 
of treatment upon pretested groups, then the behavior variable under con- 
sideraion was regarded as potentially influenced by treatment nlone and 
should merit further experimentation. 

The number of subjects in each treatment condition and posttest re- 
sponse level for 35 dichotomous behavior and event variables are given in 
Table 7. 
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Table 5 



Sample Three-way Contingency Table 
(>V=30) 



Pretested Unprotested 
Experimental Control Experimental Control 

0 10 10 10 1 

^11 U^2 ^21 ^22 ^211 ^212 ^221 ^222 

Note. Pretesting Condition X Treatment X Posttest Response Level (dependent) . 
f|'jl^ observed frequency in cell (i.j.k) . 



Table 6 

Sample Three-way Contingency Table 
(yv=14) 



Pretested 


Unprotested 


Experimental Control 


Experimental Control 


0 10 1 


0 10 1 


'ill ^12 ^21 ^22 


^211 ^212 ^221 ^222 



Note. Pretest Response Level X Treatment x Posttest Response Level (dependent) . 
ilj^ = observed frequency in cell (i.j.k) . 
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Table 7 



Frequency of 58 in Each Treatment Condition and 
Posttest Response Level for 35 Behavior Variables 



Heuristic 
group 
Posttest 
response 
level 



Control 
group 
Posttest 
response 
level 



Variable 



0 


1 


0 


1 






6 


11 


12 


1 


9.997 


p <:.005 


8 


9 


5 


8 


.222 




10 


7 


3 


10 


3.833 


p <:.06 


10 


7 


9 


4 


.041 (Y) 




1 


16 


6 


7 


4.617 (Y) 


p< .05 


8 


9 


12 


1 


4.904 (Y) 


p < .05 


9 


8 


6 


7 


.136 




7 


10 


9 


4 


2.330 




11 


6 


10 


3 


.103 (Y) 




6 


11 


7 


6 


1.033 




7 


10 


6 


7 


.074 




8 


9 


3 


10 


.938 (Y) 




9 


8 


8 


5 


.222 




8 


9 


7 


6 


.136 




2 


15 


8 


5 


6.126 (Y) 


p < .025 


5 


12 ' 


9 


4 


4.693 


p<c .05 


7 


10 


10 


3 


3.833 


p< .06 


5 


12 


9 


4 


4.693 


p< .05 


15 


2 


5 


8 


6.126 (Y) 


p< .025 


4 


13 


9 


4 


e.266 


p< .02 


10 


7 


8 


5 


.023 




3 


14 


3 


10 


.008 (Y) 




8 


9 


5 


8 


.222 




5 


12 


8 


5 


3.096 


p< .10 


12 


5 


7 


6 


.314 (Y) 




13 


4 


13 


0 


1.022 (Y) 




8 


9 


8 


5 


.621 




8 


9 


6 


7 


.002 




13 


4 


4 


9 


6.266 


p< .02 


6 


11 


7 


6 


1.033 




11 


6 


6 


7 


1.033 




9 


8 


9 


4 


.814 




5 


12 


7 


6 


1.833 




12 


5 


2 


11 


9.020 


p<: .005 


10 


7 


8 


5 


.023 





Mnemonic notation 
Representative diagram (yes) 
Representative diagram (no) 
Recalls related problem 
Uses method of related problem 
Uses result of related problem 
Routine check of manipulations 
Is result reasonable 
All information used 
Test by dimensions 
Algebraic manipulation error 
Numerical computation error 
Differentiation error 
Other errors 
Score: approach 
Score: plan 
Score: result 
Score: total 
Rereads problem (R) 
S^arates /summarizes data (S) 
Draws diagram (Mf) 
Draws diagram with coordinate sys- 
tem (Mf ) 
'c 

Synthetic deduction (DS) 
Analytic deduction (DA) 
Successive approximation (T) 
Reasoning by analogy (An) 
Equation /relation (Me) 
Algorithmic process (Alg) 
Ncnclassifiable (N) 
Checking (C) 
Structural error (T ) 
Executive error ( i ) 
Error noticed and corrected (*) 
2-unit hesitation (-) 
Stops without solution (/) 



Y = Yates' correction. 
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Results 

Across all 35 (iichoumious viiriahlcs and live h^'JJ^'ihcsrs dcaliiii^ wiih 
confounding inicraciive and [nai^ d^^^'in ih^-p^. ^.^grc ^'igHi instances in 
vvhich ihr X' analysis indicated sii^nifif*^inco ai („• |,c1(Av a/^ level of JO. Thei;e 
hr di-'-ussed firsi since ihe ass"^-^'*^*-^^ daia is noi p^^^i of liiis paper (see 
Lucas, 1972). 

Two variables, rouiine check^"^ manipulations and irial and error 
fT), exhibited inieratiive effecis of P^'^'iesiipg X ircainicni (f) <: .03 and 
p <: 10. rcspeciively). -j-^^ preici^i^^ control group checked problem . very 
intrequemly, while jusi ihe opp<»^^^*'* ^^^^ irue for the unpreiesicd coninji 
group. However, both experimental J5'*"ups showed lit^^^ variation on check- 
ing across pretesting levels. Pretesic^l ^experimental subjects used trial and er- 
ror much less than preies cd co'^'^'**^ ^^bjects, while unpretestcd gr()uf)s 
showed little difference. The most signifjcani information on pretesting X 
i-iraiment interaction was ihat gcneraliv the novelty of tl^^ interview situation 
and the heuristic mode of instruction '^ot interact lo h^'^^e a noticeable cQ'cct 
on p<^i^ttest problem-solving. 

The next question explore^ ^^^^^^ ^'ilh n:.Hin effects of pi nesting alone. 
There seemed lo he at least a margi"''^' ^^^n (y, . 10 ) in the following three 
cases: recalling related problems, co'^'^^^Hing structural errors, and stopping 
^vjihoui solution. The data revealed i^'*^' ^^ ^iny thing, pretesting had a negative 
effect on re( ailing related problems- ^^e oiher hand, pretesting appeared lo 
havea marginal tendency (p ^ .10) 'o reduce frequency of structural errors. 
Subjects who had been pretested were probp.hly more careful when reading 
problem statements and representin.^ ^'^''^flitions. Pretested subjects showed 
more persistence in pursuing a probl^*"^ to completion than unpretested sub- 
jects. The latter stopped without solution in 29 instances and the former in 
only I 

There were no interactions of f*'*etest level X treatment which seemed 
lo affect posttest problem^solving- Pret^'sied control subjects committed slightly 
frwer executive errors than the other group^- iitj^^ver. 1*1*-*'*^* were indications 
^hat the experimental groups notice*' ^Orrecird errors more frequently and 
had higher result scores. 

In probing the relationship ^^^^'een pretest and p<jsttest response 
1^,, t::. ihe following question was P^-^^^- "Assuming no pxetest X treatment 
iiiieraciion, do ihosc who tend to (low) for a given variable on the 

pirtcsi lend to sn»re similarly for il>*^' ^'^nabjc on the posttesi?" In response, 
^^,,c v»nial)lc 'd (.(j.;,rly; '^"'"'^ ^^triable was drawing diagrams 
I p ^ .OS ). ii appc.ircfl (h.^, ,^,4. ireai"»ent j^.^^j ^^^^ influence on this behavior. 
J'urther probing of the related ( hct k^''"^' **'»tegories ^^representaave diagram- 
yes" and "rej^rcsTntative diagram-i^^' ' "^^Pporied the hypothesis that poor 
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problem solvers draw poor diagrams and good problem solvers draw g(X)d dia- 
grams, and ihai exposure lo heuristies has liule effect on this disposition. 

A lime-score analysis of variance was made of the three 
nondichoiomous variables dealing with time measurements. Pretesting alone 
did not have a significant effect on any time variable. Moreover, pretestmg did 
not interaa with treatment to produce effects on posttest time scores. Also, 
when the pretest time scores were used as covariates to adjust posttest time 
scores, there was no significant difference on time excluding looking back or on 
total time, but the data indicate an appreciable difference favoring the experi- 
mental group on time spent looking back. 

After probii <■/ the data for interactions and main effects of pretesting 
condition, response levels, and treatment upon posttest- problem-solving and 
finding minimal potential innuence in most cases, the question of potential 
main effects due to treatment alone remained. Combining the information 
from Table 7 with the pretesting data and related tests, the following results 
were observed. 

I. Significant differences altribuled l« exposure lo heuristics were 

fou nd 

1. on heuristic strategies: 

using mnemonic notation {j) < .005) 
using methods of related problems (/? < .05) 
using results of related problems (/? .05) 
separating/summarizing data (/? .02) 
reasoning by analogy (marginal, p < .10) 

2. on measures of difficulty: 
rereading the problem {p -025) 
frequency of hesitation (/? ^ .005) 

3. on performance scores: 
approach score (/? .025) 
plan score (p ^- .05) 
result score {p <: .06) 
total score {p .05) 

Also, experimental subjects spent si^^nificanlly more time looking back 
at a problem {p < .08), drew fewer nonrepreseniative diagrams 
\p < .06), and exhibited less noncla.ssifiable behavior (mumbling, unclear 
statements, manipulations without rationale) {p < .02). Slight, but noM>g- 
nJficanl, differences which seemed to favor ihe experimental group were 'ound 
on time excluding looking hack, checking to see if a result is reaso;rw,le, and 
explicitly correct uig errors. 



84 



S4 



II. No apparent cffecis of exposure lo heuristics were lound 

1. on heuristic stratemcs: 
drawing diagrams 
rrprescntaiive diagram-yes 
diagrams with coordinate system 

checking (all categories — frequency or nature) 
synthetic deduction (working forward) 
successive approximation 
producing equations (translating conditions) 
reasoning by analogy 

2. on measures of dijicu I ty: 
slopping without solution 

3. on errors: 
structural errors 

executive errors (all categories — frequency or nature) 

There was also nosrgniPcant difference observed on total solution time 
or usage of algorithmic processes, the latter being almost identical for both 
heuristic groups. 

A more detailed di.scus.sion of these results can be found in Lucas 
ii972). 

Discussion 

The objectives of this sudy were to explore, conjecture, and generally 
set a course for coniinued investigatioi.. \\ its onset, certain questions were 
posed as fjuidelines. These are paraphrased here. 

1. Can a suitable instrument je devised for ob.serving and recording 
process actions and events in problem-.solving of college studer .s? If so, is the 
system reliable? Which heuristics are found in problem-solvr- g among younj^ 
adults? 

2. Can instruction in heuristics effect strategy shifts or influence jrob- 
lem-solving performaiu:e? If heuristics can be "taught,** what effect arc ob- 
servable and m. asurable? 

Tt. Can instruction in heuristics be integrated into a standard content- 
oriented CO ir.se such as calculus wiihout significantly restructuring the course? 
If so, do heunstics learned in the context of a particular subject transfer to 
more genera', mathematical problems? 

The system of behavioral analysis became increasingly important as an 
rnH in itself during this study. Kilpatrick*s system was revised to delcic 
nonheuristic Ijchaviors and to ;id(l a number of heuristics which occurred dur- 
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ing the pilot study. After several revisions and icsis of reliability, a system 
feasible for observing and recording process behavior emerged. While its ap- 
plication required painstaking effort, it was, at the time, superior to any sys- 
tem known to the investigator for classifying prcblem-solving strategies. 
Moreover, the system was moderately reliable given its complexity and the 
nature and number of judgments to be made. 

The investigator regarded the system of behavioral analy.sis as simply 
an approximation to one instrument for measuring various dimensions of 
mathematical problem-solving. Much work needs to be done in the area of 
instrumentation. This problem has become a major thrust of individual and 
team research by the investigator. 

Of all the behaviors represented in the coding system, inductive reason- 
ing and looking back in search of alternaie solutions and invented problems 
were observ-d least frequently in the protocols of first-year college students. 
The inv'.atigator guessed that induction, or searching for patterns, may be a 
problem-specific behavior and whether a subject chooses to look back or not 
may be a function of the experimental situation. Retrospective comments par- 
ticipants made- after interview sessions tended to bear this out. Further work 
on the looking back heuristic (Smith, 1973) i.s currently in progress. 

In this study, changes in strategy and improved performance were very 
positive indicators that heuristics can be taught. Students exposed to heuristics 
approached problems in a more organized fashion. They preferred to use mne- 
monic notation, their planning was more explicit, they organized problem in- 
formation more efi*ectively, and while the experimental treatment did not ap- 
pear to influence diagramming, students exposed to heuristics generally 
constructed ihcir diagrams more carefully. 

Drawing inferences from related problems— that is, building a bridge 
which connects the given situation with prior information (Wickelgren, 
1 974 )_also appeared to be influenced positively by heuristic instruction. Ex- 
perimental subjects applied methods and results of related problems more fre- 
quently than control subjects. This supported a similar result obtained by Lar- 
sen (I960). 

The processes of synthesis (working forward) and successive approxi- 
mation (trial and error) were apparently unaffected by the heuristics instruc- 
tion, as were translating problem conditions and algorithmic exercises; how- 
ever, there were indications that emphasis on heuristics did influence 
reasoning by analysis (working backward). While examining the data and 
unknown of a problem, experimental subjects were frequently observed 
breaking the problem down into a -equence of subproblems or subgoals. This 
heuristic seems to be .elated to an attitude of explicit planning. 

One of the most disappointing results observed throughout the study 
was the general absence of looking bark behaviors. Two related classifications 
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in<-iiiionc(l rjilicr ( jilicnialt- solmiops ;iinl jjosini^ new prohlcins) wrr * (Iroji- 
fx:(l Ciirly ill ihr siufly because (jI low rre(jucney (jfoljservaiioii, and in ihe linal 
analysis ihcre were im sif^nilicani diflerences on ihc nalurc and frequency of 
checking. This outcome was puzzling in view of the fact that looking back 
heuristics had been emphasized and encouraged throughout the instructional 
pericKl. 

The strong^'-si evidence of the iiilluencr of heuristic s wijs ohtitiiin! from 
certain aspects of general problem-solving perfornuincc Experimental sub- 
jects exhibited dearly superior performance on all score attriljutes — 
approaching the problem, devising workable plans, obtaining accurate results, 
and total score. Both groups, control and experimental, were similar on fre- 
quency or nature of errors, but the experimental group noticed and corrected 
errors more often. Also, cxperimeniai ftubjects seemed to have less difficulty 
with problems. They reread and hesitated much less frequently and they usu- 
ally started their solutions with greater ease. On the other hand, there was no 
real difference [M*tw<-<'n gr(»ii[)s on stopping without solution, a l)chavi(ir re- 
lated tf) f)erscveran( r. Finally, while there was no significant difleren( c on fre- 
qu<rn( y of looking back, exjierimcntal sul)jects spent more time looking back. 

The leaching experiment itself demonstrated that heuristics could be 
integrated into a content-laden curricuiar structure like university calculus 
and still have positive clTccts. In both the prsV and posttests, two of the test 
problems were calculus problems, and the other five embodied general, non- 
calculus, mathematical situations. Thoc - subjects trained in heuristics applied 
their training not only to calculus problems, but transferred it to general 
problems as well. It was not dear whether parallel instruction in a standard 
course or central emphasis in a special pr(jl)lein-solving course would be more 
conducive to learning heuristics. However, ihe reported study demonstrated 
ihe possibility of parallel instruction, and the author has subsequently devel- 
oped a seminar to explore the effectiveness of special problem-solving courses. 

Implications for Research 

At the time of this writing there was still an acute need for exploratory 
ira( hing experiments aimed at learning and teaching heuristics. Information 
gained fnmi such studies can provide direction to researchers and provide fresh 
ideas for the learning and teaching of mal hematics. 

The study reported here was concerned with many heuristics. Looking 
back, it would probably be well to limit the scope of similjr investigations to 
only one or a few heuristic behaviors. Having many similar behaviors to sepa- 
rate and make judgments about results in confusion when basic definitions and 
coding decisions have to be made. 



?7 

87 



Another limiiation of this study was ihc relatively small number of 
subjects. However, this design can be defended since the mode of interview and 
protocol analys-s demands small groups, clinical settings, or case studies, li is 
not feasible to coliect data in this manner from large groups without using 
many additional trained coder-interviewers whose judgments have been cross- 
checked for reli. ability. It is also this writer's opinion that problem-solving 
research, especially the heuristic dimension, has not advanced to the pomi of 
employing rigorous experimental designs using large randomly selected 
groups and associated statistical tests of high power. 

Systems of behavioral analysis and instruments for measuring heuris- 
tic actions within problem-solving need further refinement and testing- Opii- 
mally, a system is needed which is sensitive lo heuristics, other problem-solv- 
ing events, and performance as well as one which is applicable to all problem 
areas of mathemaiics and human developmental levels. One implication of this 
study is that any such system must incorporate the concept of process- 
sequence. In the study, frequencies of heuristic usage and score measures were 
used. This was a shortcoming, because other important information is accessi- 
ble through this system. For example, there are behavior patterns and styles 
peculiar to individuals and problems. Some subjects initiated a problem solu- 
tion without any explicit plan, others specified a complete plan at the outset, 
and still others produced fragments and subplans as the solution progressed. 
Similarly, some subjects never checked their work, others checked the entire 
problem after completion, and others checked back after each productive step. 
These are examples of patterned or stylistic behavior that should motivate 
further investigation for which elaborate instruments will be necessary. 

The lack of inductive reasoning and looking back behaviors raises 
questions about task variables, problem-specific heuristics, and situation vari- 
ables. Kilpatrick (1975) has discussed research variables and methodologies 
quite thoroughly. Questions related to problem-specific heuristics might be: 
"Do certain problems, by their structure, tend to elicit pattern search behav- 
ior?"; ''Does ihe way in which a question is asked inQuence the heurisuc di- 
rection a solver will take?"; and "Docs interview time, presence of inter- 
viewer, or number of test problems inhibit certain heuristics, e.g-. looking 
back?" The study reported here suggests that the thinking-aloud data-gather- 
ing method be.* coupled with retrospection by the solver to maximize the infor- 
mation available about the solution process. 

.Assuming that heuristics can indeed be taught, the larger question, 
"How?", stil! remains with us. Is Polya's system of questions and suggestions 
sufficient? Must instruction include many mathematical problem? . in- 
struction using selected problems and sequences yield the same results? 
Should heuristics be identified explicitly and their application demonstrated in 
an expository manner, or should the teaching process evolve organiC'iHy. im- 
plicitly, and without labels? Specifically, what actions of the teacher promote 
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cflfeclive use of heurisiics in ihe learner? ExperimcnU'ition is needed which 
bciier controls ihc teacher variable. 

The question ol retention of heuristic strategy remains unresolved. 
Once learned, d'j heuristics necti to be practiced!* How long must students be 
exposed to instruction in heuristics before positive, lasting effects take hold? 
Do heuristics learned on specific classes of problems transfer to general 
problems and vice-versa? While this study hints at positive effects after apply- 
ing heuristics to calculus problems, this investigator has a suspicion that a 
course in general problem-solving would be more effective. 

The reported study involved young adults at ihe college level. Some of 
Polya's heuristics were present in their problem-solving and some were not. 
Further study is needed to uncover relationships between developmental level 
and learning heuristics. Are there ''golden moments" for learning certain 
heuristics? Arc some heuristics never learned unless explicitlyjaught? What 
can we do in elementary and secondary schools to promote communication and 
use of heuristic methods for better problem-solving? 

Finally, a process-oriented system of scoring problem solutions needs 
to be developed (see Kantowski, 1974). What makes a good problem solver? 
We, as rcsearchrrs and teachers of mathematics, have an idcn ol' wliai we 
mean by a good problem -solver, but we havr not specified well-defined charac- 
teristics. For example, one problem-solver may exhibit many varied and so- 
phisticated heuristic processes in a (0'*.'*cct solution while another may use ap- 
parently few heuristics but solve the problem much more quickly. Who is the 
better problem solver? What is an ''elegant" solution? Perhaps we will always 
have differences of opinion in answering these questions, but it is time we 
address them squarely. 

Mathematical problem-solving and heuristic strategy need much more 
attention from research. Research efforts in this area are closely linked with 
the core of teaching and learning in the classroom, for it is through problem- 
solving that students discover mathematics and it is through heuristics that 
students discover problem-solving. 

Postscript 

The research reported here has since been extended by the writer to 
iHotiv.ite (urridiiuni development and research in mathematical problem- 
solving. One very pleasant and challenging outgrowth of this research was the 
(leveloprncnt of a university mat hem. i tics seminar speciGcally concerned with 
rnathcmaii( al problem-.solving. The course was designed by the author during 
a 2-ycar f>e» iod following the completion of his dissertation, and was offered 
for the iirsi time in the spring term of 1974 at the University of Wisconsin — 
Oshkosh. J(s basic struriurc reflec'.s an integration of inaihemdtics and psy- 
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choiogy: ihe hturisiic concepts of Polya and oihers (sec Rubinsicin, 1974 and 
Wickclgrcn, 1974), iht: instruciional fonrtat of R.L. Moore (Whyburn, 
1970) in which pariicipanis present and analyze their problem solutions, and 
a colkction of nonrouiine problems drawn from various branches of maihe- 
ma^cs at an elementary level. The chief objective of this seminar is to improve 
communication of mathematical problem-solving, with enhanced individual 
problem-solving as a pctential byproduct. It is clear to the writer that a one- 
se.nester course in probiem-solving probably has little effect on long-term 
strategy shifts in the problem-solving of adults. However, getting preservicc 
and inservice teachers of secondary school miuiicniatics conversant with ques- 
tions, suggestions, alternate solutions, patterns, aiialogies, and heuristic strat- 
egy is a step in the righi direction. 

Unlike most mathematics courses, this seminar offers problems se- 
lected with the intent of reinforcing heuristic strategy rather than specific 
mathematical concepts. The problems are the medium of instruction, while 
the "concepts" are heuristic techniques. Consequently, participan.ls are en- 
couraged to talk about their problem solutions, especially their strategies. 
Each solution is.critiqued by the group, the solver must defend notoni; his or 
her mathematical 'statement, but also the reasoning behind it. 

The seminar in mathematical problem-solving has provided fertile 
ground for cultivating ideas on teaching and learning mathematics; it also has 
motivated extended research on heuristics. The writer has developed new per- 
spectives on heuristics by observing and discussing the problem-solving of ad- 
vanced undergraduate and graduate students. The use of symmetry as a tool in 
solving problems, the transfer of a technique from one branch of mathematics 
to another ( for example, Polya's level lines strategy in analytic geometry and 
Lagrange mi'-ipHers in analysis), and the distinction between Pappus's 
working backward technique and Wickelgren's (1974) subgoals method are 
examples of these new perceptions. 

In spring 1975, the Georgia Center for the Study of Learning and 
Teaching Mathematics (GCSLTM) called together 45 researchers in mathe- 
matical problem-solving for a conference on research issues and ideas. The 
primary thrust of this and several other GCSLTM conferences was an at- 
tempt to consolidate research in mathematics education. The writer was in- 
vited to the problem-solving conference, which had the theme "Heuristics."' At 
the conference, current variables, models, and methrdologies for research were 
summarized (see Hatfield, 1975 and Kilpatrick, 1975) and a report on 
"leaching experiment** research in the Soviet Union was given (Kantowski, 
1975b). A general report of GCSLTM research activities was presented to the 
International Congress on Mathematical Education at Karlsruhe, West Ger- 
many in the summer of 1976 (Hatfield, 1976). 

One of the tangible outcomes of the GCSLTM problem-solving con- 
ference was the formation of research teams around shared areas of interest for 



collaboration ^^nimon researchable problems. ^u»'rently, ihc wriicr is a 
"^< niber of a six-person team v-on(^*-^^""g clinical studies on hcurisiics ai vari- 
ous development«il levels. AduU pariicipunts from the writer's problem-solv- 
ing seminar have served as subjects for a small-scale leaching experiment 
"modeled after the dissertation study reported here, but including more heuris- 
tics and a modified system of analysis. Over a 2-year period this team has 
concentrated on developing a team-construi-ted, reliable, clTecitve system of 
^^chavioral analysis, p^.^ ^^jj. effort includes developing an adequate system 
^or scoring problem solutions. A rcp<^rt of the team's activities was made to the 
National Council of Teachers of Mathematics at their national convention in 
Cincinnati in spring 1977 A monograph is in press at the time of this writing. 

Inuring the next decade, collaborative efforts by researchers tising the 
niaihemati( S da.ssroom as a lahoratcry sfiould yield cITcnivc models for learn- 
>nK and um( hing heuristics at all levels. 
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Chapter 6 



A Multidimensional Exploratory 
Investigation of Small Group-Heuristic 
and Expository Learning in Calculus 

Norman J. Loomer 

The research reporied in ihis chapter explored ihe efTecis of iwo differ- 
ent methods of calculus instruction — ihe small group-discovery method pio- 
neered by Davidson (see Chapter 3) and an expository method. In this study 
carefully prescribed models of teacher behavior for the two methods and an 
inventory, called ihc Measure of Teacher Fidelity to the Model, were devcl- 
(,p<-d. The inventory records students' lUTccption of the teacher's classroom 
Ijehavior. The models and inventory will make extension of this research eas- 
ier, since they enable an invcstiiralor \o show that the instriiclion was adminis- 
tered as prescribed. 

The author taught two intact college Calculus I classes, one by each 
method, for one semester. He selected criteria for a multidimensional compari- 
.son "f ihe two methods, and selected aiid developed instruments to measure 
instructional outcomes along those criteria, instructional outcomes were mea- 
sured at three points during the study: Ob.scrvation 1 occurred immediately 
before the instructional phase, Ob^^e/vatmn 2 took place at the end of the in- 
struction, and Observation 3 occurre(i I month after Observation 2, immedi- 
ately following a college vacation. This chapter reports the results of the evalu- 
ation and analyzes the evaluation procedures and instruments. The study was 
exploratory, not experimental. It was only a first step toward bridging the 
large gap between Davidson's (1971a) original feasibility study and a large- 
scale, carefully c ontroHed^ experimental study. The statistical analyses, there- 
fore, do not yield firm conclusions hut rather define and sharpen hypotheses 
about the differential effects of the two methods. The resiilts must therefore be 
rej^arded as leniativc. 

I^is( rivrry t(M<'^^i*\^ is sometimes assumed to f)e synonymous with heu- 
ristic teaching, [)ut Higgins (1971, p. 487) observes that to a mathematician 
the word, heuristic has an infinitely richer meaning than simply discovery. He 
urges that a teaching technique be called heuristic if it (a) approaches content 
through problems, (b) reflects problem-solving techniques in the logical con- 
struction of instructional procedures, (e) demands flexibility for uncertainty 
and alternate procedures, and (d) seeks to maximize student action and par- 
ticipation in the teaching-learning process, (p. 494). Believing that ;he small 
gro'ip-discovcry method meets these f^)ur criteria, the author has taken the 
liberty of renaming ii the small jrroup-heurislic method, 
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Method 

^istructional Procedures 

Many investigations of leachi'^g methods have been criticized because 
quite different methods are labeled 1I2 same. Moreover, the methods com- 
pared are seldom described in behavioral terms that are precise enough to 
allow replication (Fey, 1969, pp. 536, 545; Richards, 1973, p. 149; Tanner, 
1969, p. 654; Wittrock, 1966, pp. 44-45). To avert such criticism of the 
present study, models of rigorously defined patterns of teacher behavior for the 
small group-heuristic method and an expository method of teaching calculus 
were developed. 

The m^-lcls;:grew out of discovery -expository research in elementary 
school arithmetic by Worthen ( 1968, pp. 225-227) and Robertson (1970, pp. 
30-38). Modifications of the discovery teaching model were made to make it 
appropriate for the small. group-heuristic method of teaching calculus and 
were strongly inQuenced by the instructional practices and classroom organi- 
zation employed by Davidson (1971a, pp. 100-102, 162-166) and the heuris- 
tic questioning style advocated by Polya (1957). 

The models for the two methods are differentiated along eight dimen- 
sions: (a) organization of the class, (b) initiation of learning experiences, (c) 
interjeuion of teacher knowledge, (d) questioning and answering procedures, 
(e) appraisal techniques, (f) control of student interaction, (g) use of instruc- 
tional materials, nd (h) determination of policies. Brief summaries of model 
teacher behavior for each method along these dimensions follow. 

Small Group-Heuristic Method 

Organization of the class. The learning unit is a group of three or four 
students. Within each group the students learn together by doing problems, 
exploring questions, and proving theorems. 

Initiation oj learning experiences. Exploration precedes formalization. 
The teacher initiates student exploration of a concept mainly by raising ques- 
tions. If the teacher chooses to formalize the concept, he or she waits to do so 
until the group has successfully completed its exploration. 

Interjection of teacher knowledge. The teacher tries to develop a learn- 
ing climate that permits students to show their knowledge, and therefore does 
not act as the primary source of information. Instead suggestions are given fur 
solving problems only when asked or when help is clearly needed. Even then 
the teacher encourages students to contribute to the solution. He or she is re- 
ceptive to different approaches to stimulate students* ideas and suggestions. 

(Jut'Stinning and ansxvering procedures. When asking or answering 
questions the teacher avoids giving too much information. He or she tries to 
ask questions or indicate steps that could have occurred to the students them- 
selves. Questions should apply not only to the problem at hand, but also, 
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whenever possible, lo olhtr similar problems. Many limes ihese qufsiions 
come from Polya's ''How lo Solve li'* list (1057, pp. xvi-xvii). 

A/j/iraLsal U'c/uik/ucs, If an aiinosphere for exploraiion is lo prevail, 
ihe leaehcr must be sensilive about appraising siudeni responses. He or she 
does not judge ineorreci respon.ses in a ncgalivx: "banner, but uses ihem losiim- 
ulaie a coniinued scan \\ for a solution. If students arc unsure ijfa respon.se, the 
teacher may cnec/ura^c ;i £;uess or hunch. .Students arc urged to find their own 
errors by using Polya's **l.ooking Baciv' heuristics. (Polya, 1957, p. 14). 

Cftntrtti <}/ student interaction. The teacher encourages group members 
lo work together cooperatively, and to build the ideas of others to achieve 
group solutions. He or she discourages interaction between groups to avoid 
interfering with each group's opportunity lo achieve iis own solution. 

L^se of instructional jnatermls, A group is led lo discovery of a mathc- - 
maiical concept by exploring problems prepared by the teacher and distrib- 
uted to each student. 

DetenniiKition of /)f>licie.s. By class discussion and decision-making by 
majority vote, the students determine policies on grading, scheduling of exami- 
nations, the manner ol forming work groups, and standards of behavior for the 
work groups. 

Expository Method 

Organization of the class. The lcarni':;g unit is the entire class. The 
students learn inaihcmatics by watching the tea':her, asking questions, re- 
sponding to questions, and doing daily liomewoik problems. 

Initiation nj learn in if experiences. Formalization precedes exploration. 
The teacher slates definitions, proves theorems, and describes concepts before 
exploring them by means of examples. Then the stu'.ients can explore them in 
homework problen^s. 

Interjection of teacher knowiedji^e. The teacher acts as the primary 
source of mathematical knowledge, by indicating to students that he or she will 
always be able to work a problem correctly if they cannot. 

Quest ion in if and ansvermjf prttcedures. The teacher asks the class 
questions that are simple, close-ended, and directed specifically to the concept 
being discussed. He or she immediately recognizes incorrect answers, gives 
students an opf)onui^ity to correci iheir own mistakes, and responds to student 
questions bv reiterating a principle or relationship. The teacher may use an 
example to clarify the way a principle or relationship is used to solve a 
pr(ii>iein. 

Apfnaisal leehnnittes. 'V\\v teacher shows great concern for errors. He 
or die loMow.s ii siudrnl's incorrect responses with a discussion on whv the 
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errors are incorrect, lakes care not lo judge siiidenls negaiively and warns 
students about common errors and uses examples to emphasize them. 

Control oj student interaction. The teacher encourages students to 
share their ideas about a problem with the class. 

Use oj instructional materials. The teacher uses the textbook, which 
has expository characteristics, as the primary source of materials and ideas. 

Determination of policies. The teacher determines virtually all policies, 
including the method of grading and scheduling of exams and quizzes. 

In addition to th(j eight dimensions along which the models for the two 
methods differ, there arc three dimensions along which they coincide: sufficient 
teacher preparation, teacher enthusiasm, and nonevaluative climate. Brief 
summaries of model teaching behavior along each dimension follow. 

Sufficient teacher preparation. The teacher has teaching materials 
ready at the beginning of each class. 

'fearher enthusiasm. The teacher projects a sincere enthusiasm for 
mathematics, for the students, and for the teaching method that he or she is 
using. 

Nonevaluative climate. The teacher docs not make value judgments 
when responding to students and establishes a climate in which they are free to 
respond even when uncertain of their answers. 

Criteria for the Comparison 

Many research designs used to compare two treatments have been crit- 
icized for attempting to show one method superior to another by measuring on 
a single criterion, usually achievement (Begie & Wilson, 1970, p. 368; Fey, 
1969, p. 536; Shulman, 1970, pp. 36-37). This approach is insensitive to the 
differential effects of the two methods. In the present study, therefore, instruc- 
tional outcomes were measured by multiple criteria. 

A review of the literature on expository-discovery research (Brown, 
I97I; Fey, 1969; Hughes, 1974; Scott & Frayer, 1 970; Shulman, I970;Tan- 
ner, 1969; Willoughby, 1969; Wittrock, 1966) revealed that the enthusiasts 
for discovery learning claim that it enhances motivation and retention of con- 
cepts, and develops problem-solving ability, the heuristics of discovery, deeper 
understanding of concepts and structure, and realistic insights into how marh- 
cmaiic.s grows. Those less enthusiastic about discovery learning counter that it 
is KX) time-consuming, offers little to the learner that cannot be offered by good 
expository teaching, and is not beneficial for the cognitively sophisticated 
individual. 

Consideration of these a.ssertions led to the selection of the following 
criteria for comparative evaluation of the small group-heuristic and expository 
methods of teaching calculus: (a) calculus achievement, (b) calculus achicvc- 
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mtni at ihc (nmpuiaiion-cornprehcnsion (•i»gniiivc level. f( ) caUulus acliievr- 
nu'ni ai ihv a[)[)lirai ion-analysis rogniiivc level, (d) maihemaiical problem- 
solving aehievemcni. (e; mailiemaiirs aiiauclcs. (i) problt'm-solving behav- 
iors, (g) rcicniion of calculus aehievemcni, (h) releniion of calculus achieve- 
meiu al ihc conipuiaii()n-eomf)rehensi()n eogniiive level, (i) releniion of 
calculus achievcntcni ai ihe applicaiion-analysis cognitive level, ( j) nrieniion 
of maihemaiical problem-solving aehievemcni, (k) releniion of problem- 
solving behaviors, and (1) raic of coverage of material in each method. 

Tests and Measures 

Measures- nj Teacher Fidelity lo thf Mndel, To assess the degree to 
which the teacher adhered to the models, the investigator adapted an inventory 
developed by Worihen ( 1968). Called the Mca.sure of Teacher Fidelity to the 
Mode I, it was administered to the students in the calculus classes after the 
iiisiructional f>hasc of the study. The inventory consists of a series of siaie- 
nicnis drawn (rom the models, alxuit teacher beliavior to which the student 
rcs|M)r)(fs *'A** il ihc {rwvUi'v aljunsi alu'(t\\ did it in the class, ''B*' if the teacher 
uftrn did it, il \'irnrlirnrs, if srlflntn, and 'M'/' if the teacher alinnst 
rwi er did it. 

The statements are of three types: those that refer to the expository- 
heuristic characteristics of the method, those that refer to characteristics on 
which the methods are to coincide, and, among the statements drawn from the 
model for the small group- heuristic method, those that refer to operation of the 
small groups. The three types of items make up an Expository-Heuristic 
Scale, a Coinciding Characteristics Scale, and a Small Group Operation Scale. 
The 34 items on the Expository-Heuristic Scale are classified into items that, if 
answered a flirma lively, typify student perception of highly expository teaching 
behavior ( E items) and those that typify highly heuristic behavior ( H items). 
The five items on the Coinciding Characteristics Scale are classified into items 
that typify desirable leaching behavior (DT items) and those that typify un- 
desirable leaching behavior ( UT items). The six items on the Small Ciroup 
Operation .Scale, lo which only students in the small group-heuristic class re- 
sponded, arc classilied into items that typily desirable grou[) operation (DC 
iicfos) and those that ty[)ify undesirable group of)eration (LKi items). 

.Six sionple items, one from each elassitieation, are listed l)eIow. 
.S. (>ur teacher created the feeling that a stigma was at- 
tached if a student made an error. (UT) 

18. Our teacher was enthusiastic toward niatheniatics, 

toward the students, and to wan! the method of teaching he was 
using. ( DT) 

1 9. Our teacher gave us a rule or procedure to use for 

solving new kinds of problems. (E) 
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33. When our teacher a.ssisted us in the solution of a problem, he 

gave suggestions that applied not only to the problem 

at hand but also to problems in general. (H) 

41. In our class the members of each group worked to- 
gether cooperatively to achieve a group solution to a problem. 
(DG) 

44. Our teacher encouraged members of a group to talk to 

member? of another group or observe the activities of another 
group. (UG) 

A response to an item is scored as follows: On the H, DT, and DG 
items, the response is assigned a value of 4 for almost always, 3 for often, 2 for 
sometimes, 1 for seldom, and 0 for almost never. On the E, UT, and UG items 
the scale is reversed. On each inventory an index between 0 and 100 is ob- 
tained for each scale by multiplying the mean of that scale s item scores by 25. 

The following four tests were administered as shown on Table 1 to 
gather data about student achievement, attitude, and behavior in each of the 
experimental classes: 

Calculus Achievement TeM, The Calculus Achievement Tests include 
multiple-choice items drawn from the 1969 Advanced Placement Examina- 
tion in Mathematics ( Pinkbeiiier, Neff, & Williams, 1971), sample Ad- 
vanced Placement Examinations (College Entrance Examination Board, 
1972), and items developed by the investigator. Three judges, including the 
investigator, classified each item by content category* and National Longitudi- 
nal Study of Mathematics Abilities (NLSMA) (Romberg & Wilson, 1969) 
cognitive level. For the purpose of this study the /our NLSMA cognitive levels 
were compressed into two, Computation-Comprehension and Application- 
Analysis. Each item was then entered into a content category by cognitive level 
matrix, and from eachcell items were randomly assigned to two forms of the 
Calculus Achievement Test. Because not ail topics originally scheduled were 
actually covered by both calculus classes, several items were eliminated from 
the scoring, resulting in abbreviated forms called Form A* and Form B*. 

Problem-solving achievement tests. Problem-solving achievement was 
defined to be the score on a multiple-choice problem-solving tests. The 
problems were to be what Dodson (1972, pp. 3-6) calls ''insightful" — not 
routine textbook problems but problems requiring original thinking. A wealth 
of these problems was found on the Preliminary Contest Examinations of the 
Wisconsin Section of ihe Mathematical Association of America, offered annu- 
ally in Wisconsin high schools "to discover and encourage talented students*' 
( Buck, 1959, p. 202). Problems from the examinations were classified hy dilli- 
culty level and content category — algebra, geometry, or number systems — 
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Table 1 

Administration Scliedule lor Criterion Measures 



Observation 1 


Observation 2 


Obervation 3 


Calculus Achievement 


Calculus Achievement 


Cbiculus Achievement 


Test, Form A* 


Test, Form A* 


Test, Form B* 


Mathematical Problem 


Mathematical Problem 


Mathematical Problem 


Solving Test, Form A 


Solving Test, Form B 


Solving Test, Form C . 


Mathematics Attitude 


Mathematics Attitude 




Scale 


Scale 




Problem Solving Attitude 


Problem Solving Attitude 




Scale 


Scale 




Problem Solving Interview 


Problem Solving Interview 


Problem Solving Interview 


Measure of Teacher 


Measure of Teacher 




Fidelity to the Model 


Fidelity to the Model 



o 

0 
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and items from each cell of the resulting matrix were randomly assigned to 
three forms of the Mathematical Problem .Solving Test. Responses to items on 
both this test and the Calculus Achievement Test were scored 4 if right, -1 if 
wrong, and 0 if omitted. 

Mathematics attitude measures, Romberg ( 1 969, p. 48 1 ) argues that a 
single, global measure of attitudes toward mathematics is not realistic, since 
there is probably a set of feelings that vary from computation to problem- 
solving. The assessment of attitudes in this study, therefore, had three phases; 
a measure of general attitudes toward mathematics; a measure of attitudes 
toward problem-solving, which was an important variable in the two instruc- 
tional methods; and an open-ended questionnaire to elicit the reaction of the 
small group-heuristie class to their teaching method. The measure of general 
attitudes toward mathematics was Aiken and Dreger*s ( 196 1 ) "Revised Math 
Attitude Scale,*' which was called the Mathematics Attitude Scale during this 
study. It consists of 10 statements connoting negative attitudes and ten state- 
ments connoting positive attitudes toward mathematics, to which the student 
responds to one of five Likert alternatives. The Problem Solving Attitude Scale 
was constructed by the investigator, who selected 16 statements specific to 
problem-solving, eight positive and eight negative, from attitude-toward- 
mathematics instruments developed by Coon (1969, pp. 175-176), Cummins 
( 1 958, pp. 1 79- 1 8 i ) , and Worthen ( 1 965, pp. A3.30-A3.3 1 ) . For each state- 
ment the student respnds lo one of five Likert alternatives. The Small Group 
Calculus Class Questionnaire is an adaptation of the one developed by David- 
son for his feasibility study. The student responses for each question were 
classified into various categories and counted. 

Procedures for assessing problem-solving behaviors. Student problem- 
solving behaviors were assessed using the diagnostic procedure developed by 
Kilpatrick (1967) and Lucas (1972) (also sec Chapter 5). Students partici- 
pated in 1-hour interviews during which they thought aloud as they solved 
three mathematical word problems. The interviews were tape recorded, and 
the taped commentaries and written work were used as the basis for analysis 
using Kilpatrick *s and Lucases system of behavioral analysis. The system used 
in this study evaluates 59 aspects of problem solving activity, which fall into 
five categories: (a) heuristic strategies, (b) modes of diflBculty, (c) types of 
errors, (d) performance measured by time, a.id (e) performance measured by 
score. 

Procedures for the Pilot Study 

The pilot study of the two instructional methods was conducted during 
the fall .semester of 1973-^974 at Ripon College, a small, private, coeduca- 
tional, liberal arts college in cast central Wjsconsin. Two classes of Calculus 1 
were offered, both taught by the investigator, for which students registered in 
the usual way. Students were therefore ni;t assigned randomly to the two 
cla.sses, but neither were they selected in any special way. It was expected that 
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ihere would be no sie^nificani iniiinl dilTcrcnrcs hciurcn ilic iwo tliisscs on i\ny 
of llu: selc( t( (1 i nurh\ :irui tluii ihc siaiisiical design would adjusi for any mi- 
nor (lilfcn-mcs 'VvAvhmv^ niclliods were assigned randomly lo ehisscs. 

Beciiusc u-adier inicraciiori with ihe small groups would be an impor- 
lani and lime-consuming atlivity in ihe small group-hcurislic elass, enroll- 
ment was limited to 16 students. Enrollment in the expository class was not 
limited: 25 dents registered, and 1 6 of ihem wrre randomly selected to par- 
ticipate in the evnluation phases of the study. 

Then- wiMr tiir<?e observations of student perfornianee: Observation 1 
at the beginning of the semester imnic'diatcly before the instructional phase, 
Observation 2 ai ihe end of the semesier upon completion of the instructional 
phase, and Observauon 3 one month after Ob.servation 2, immediately follow- 
ing a college vacc^iion. 

Table 1 iius the measures administeied at each observation. For Ob- 
servation 1 all measures "xcept the problem-solving interviews were adminis- 
tered during the first tiirer class me^M.ings. The interviews were conducted in 
the investigiitor's office during the first 8 days of the semester. The Observation 
2 interviews were conducted during the last 8 days of the semester, and with 
the exception of the Calculus .Achu r.'cmcnt Test the remaining measures were 
administered during the last two class meetings. The Calculus .Achievement 
'lest was administered during the final examination at the same time to stu- 
dents in b(»th classes. Vhv Observaii(»n 3 interviews were conducted during the 
first 8 days of the second semester. The remainir.5 measures were administered 
during a special testing session on the day before that semester's classes began. 

Subjects 

Most of the students parueipating in the ev-i; -aiion phases of the study 
were 18-year-oh! male freshman mathematics or science majors with 8 semes- 
ters of high school mathematics and mathematics grade point average above 
3.00. Most of them had not studied calculus in high school, bad ACT Mathe- 
matics scores of at least 28 or SAT Mathematics scores of at least 600, and had 
a high school percentile rank of at least ^^0. 

Attrition during the study reduced the niimber of sul)je( is from 16 to 
13 in each class: Two students in the .^n all group-heuristic class were unable 
'.(; attend testing sessions because of illness and one declined to participate in 
the last I w() problem-solving interviews. Three subjects in the expository class 
withdrew near ihc end o- ihe semester because of unsatisfactory grades. 

Instructional Materials 

hi the expository class, the textbook Calculus of Om' Vanahlc. Sn 'md 
Eilifn^n by Sccle\ (1*^72) was used, lu the small Lrroiip-heurisiie class the 
instructional materials were based largely on prepublicalion materials for the 
Ixuik Calculus: A Sludcul Discm^cry A/)/>rnach by Davidson and Leach 
( 1973). The author found it necessary, however, to revise these materials be- 
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cause the contenl did noi match the topics lo be covered in Calculus I, and in 
some respects ihe organization did not conform to the author's biases about 
how the ideas of calculus should be developed. In his revisions the author was 
guided by the following principles: (a) the need for processes should be estab- 
lished before leaching them; (b) the concrete should precede the abstract; (c) 
the approach to concepts should be intuitive rather than rigorous; (d) defini- 
tions and symbols should be 'introduced only after the student has had ex- 
tended experience with the ideas they represent; and (e) "Let us teach proving 
by all means, but let us also teach guessing" (Polya, 1963, p. 606). 

Statistical Design 

Three statistical models were used to analyze the data gathered during 
ihe study: analysis of variance for data on three measures of academic status 
prior to instruction (high school percentile rank, SAT Mathematics score, 
Ripon College Mathematics Placement Test score) , for Observation I data on 
the Calculus Achievement Test, Problem Solving Achievement Test, Mathe- 
matics Altitude Scale, and Problem Solving Attitude Scale, and for data from 
the Measure of Teacher Fidelity lo the Model; analysis of covariance for Ob- 
servation 2 and 3 data on the Calculus Achievement Test, Problem Solving 
Achievement Test, Mathematics Attitude Scale, and Problem Solving Attitude 
Scale and for three measures of time taken during problem-solving interviews; 
and logit analysis (Goodman, 1970) for the analysis of 56 other measures of 
problem-solving behaviors. Potential covariates for each performance measure 
subjected to analysis of covariance included the three measures of academic 
status prior to instruction, the Observation i scores on that performance mea- 
sure and on the Calculus Achievement Test, Mathematical Problem Solvmg 
Test. Mathematics Attitude Scale, and Problem Solving Attitude Scale. The 
covariates for each performance measure were selected using a step-by-step 
regression procedure described by Draper and Smith (1966, pp. 171-195). 



Results 

significance Levels 

For meaningful interpretation of data on instructional outcomes, it 
was crucial to find clear evidence that the two classes received instruction that 
differed consistently on the expository-heuristic characteristics of the instruc- 
tional model and agreed consistently on the coinciding characteristics. It was 
therefore important not to infer a difTcrcnce on the expository-heuristic char- 
acteristics of instruction when one did not exist (i.e., make a Type I error) and 
important not lo fail to infer a dilTerence on the coinciding characteristics of 
instruction when one did exist (i.e., make a Type II error). Accordingly, for 
analysis of the data on the expository-heuristic characteristics of instruction 
the significance level was set at .01 to minimize the chance of a Type I error, 
and for analysis of the data on the coinciding characteristics of instruction it 
was set at .10 to minimize the chance of a Type II error. 

lOi 



The purpose of analyzing the daui on insiruclional (iuuomcs was n(»i 
lo make gcntTali/aiions, but lo prohr lor eonicciures'io serve as ihc basis lor 
future experiments. Thus the objeclives of the study were threatened more by 
failure to infer a treamicnt effect when one did exist (1 ype II error) thnn by 
inference of a treatment effect when one did iwi exist (Type 1 error). Accord- 
ingly, for analysis (;r the data (;n instructional outcomes the significance level 
was set at .10 in order to minimize the chance of Type II error. 

Analysis of Initial Data 

In order to determine the comparability of the two calculus classes 
prior to instruction, data on the initial measures were subjected to analysis of 
variance. Table 2 shows that, contrary to the invesiigaior^s expectations, the 
calculus classes were not equivalent at the beginning. On every measure the 
mean of the expository class .exceeded the mean of the small grou[)-heuristic 
class, and on l .vo measures, the SAT Mathematics Test and the Mathematical 
Problem Solving Test, the difference in means was statistically significant 
(/>< .10). 

The non equivalence (;f the calcuiu.s da.sses on the.se measures subjects 
the results of the analysis of covariance to interpretation difficulties. First, the 
covarianii- adjusimcnt may not have removed all bias; some bias may be 
})reseiit Irorn a (lisiurl)ing vai iabic thai was overlooked. Scconil, when the 
covariaics sliowcd real differences between the groups, covariance adjustments 
involved extra|>olaii()n. Consequently, the farther apart the groups were on 
the covariaie means, the more imprecise was the estimate of the difference in 
the adjusted means. Thus the adjusted diiferences may be insignificant statisti- 
cally because the adjusted comparisons are of low precision (Cochran. 1957, 
pp. 265-266). Theref(;re, interpretation of the results ol the analysis of covari- 
ance may be speculative. 

Analysis of Teacher Behavior 

Data from the Measure of Teacher Fidelity to the Model at Observa- 
tion 2 provide clear evidence that the teacher taught the two classes in close 
conformity to their respective models. Table 3 shows the means by class for 
each of the ihree scales of the fidelity measure. The means on the Expository- 
Heuristic Scale show that the students perceived the teacher's behavior lo be 
heuristic in the small group-heuristic class and expository in the expository 
class; analysis of variance indicates that the difference in means is highly sig- 
Bific ani (/>«r ^001 ). Analysis of data from the Coinciding Characteristics 
Scale by analysis of variaiur yields a nonsignificant difl'erence in means 
{/) > .10) iiidicatinj^ that, as desired, the students did not perceive the 
teacher's behavior lo dilfer on the coinciding characteristics of the model. The 
mean oi (he stores of ihe small gionp-beurisiic class on the Small Group Op- 
eration Scale has a coniidence interval of (77.8, 88.9). indicating that the 
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Table 2 

Analysis of Variance for Initial Measures 

Means 



Expository 



' " " 83 2 94.2 

High School Rank (100) . 689,8 

SAT Mathematics Test (800) " 60.1' 

Ripen Mathematics Placement Test (108) gg^^ 

Mathematics Attitude Scale (80) J^' 456 

Problem Solving Attitude Scale (64) ^ gj 

Crfteulus Achievement Test, Form A* (96) ^' ^ g 

Computation-Comprehension Scale (52) 2.8 

Application-Analysis Scale (44) • 22*g 

Mathematical Problem Solving Test, Form A (52) ^ 



^Numbfers in parentheses indicate maximum possible score. 
*p<.05, 
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Table 3 

Analysis of Teacher Behavior Data 



Small group- 
heuristic class Expository class 



Scale 


Mean 


Perfect score 
for model 


Mean 


Perfect score 
for model 


df 


F 


Expository-heuristic 
Coinciding characteristics 
Small group operation 


75.2 
90.0 
83.2 


100 
100 
100 


24.5 
95.0 


0 

100 


1/24 
1/24 
12 


, 302.56* 
1.56 



•p«: .001. 
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siudenis perceived small group operation lo be in close conformily lo ihc 
model. 



Selection of Covariates 

Table 4 lists the covariates selected for each of the instructional out- 
come variables by the step-by-step regression procedure. The procedure 
selects only variables that are signiGcanily related {p < .10) to the outcome 
variable. An unexpected discovery was that scores on Form A of the Mathe- 
::aatical Problem Solving Test were significantly related not only to the scores 
on Forms B and C, as would be expected, but also to five of the six calculus 
achievement measures. In this study the Mathematical Problem Solving Test 
was a better predictor of calculus achievement than the measures usually used 
for this purpose— high school rank, SAT mathematics score, and placement 
examination score. The use of a problem-solving test in predicting calculus 
achievement appears to be an important subject for further investigation. 

Analysis of Instructional Outcome Data 

The evidence from the teacher behavior data makes possible meaning- 
ful interpretation of the data obtained to compare instructional outcomes on 
the 12 selected criteria. Table 5 shows the results of analysis of the data for 
criteria 1-5, which concern outcomes measured immediately after the instruc- 
tional period. 

Calculus achievement. On Form A* of the Calculus Achievement Test 
at Observation 2 the means, adjusted by the analysis of covariance for initial 
differences between thrt two classes, favored the expository class, but the differ- 
ence did not approach the significance level of .10 chosen for the criterion 
/ncasures. 

Calculus achiever.ient at the computation-comprehension cognitive 
level. On the Computation-Comprehension Scale of Form A* of the Calculus 
Achievement Test*at Observation 2, the adjusted means slightly favored the 
expository class, but the difference did not approach significance. 

Calculus achievement at the application-analysis cognitive level. On 
the Application-Analysis Scale of Form A* of the Calculus Achievement Test 
at Observation 2, the adjusted means favored the expository class, but the dif- 
ference did not approach significance. 

Mathematical problem-solving achievement. On Form B of the Math- 
ematical Problem Solving Test at Observation 2, the adjusted means favored 
ihe expository class, but the difference again did not approach significance. 

Mathematics attitudes. On neither the Mathematics Attitude Scale nor 
the Problem Solvis^g Attitude Scale at Observation 2 did the difference in ad- 
justed means approach significance. On the Small Group Calculus Class 
Questionnaire, administered to students in the small group-heuristic class, the 
reactions to the method ranged from hostile to enthusiastic. On the negative 
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Table 4 



Covariates Selected for Instructional Outcome Measures 



Outcome measure 


Covariates selected 




Observation 2 


Observations 


Attitude ma$m 






Mathematics attitude 


Mathematics attitude 




Problem-solving attitude 


Problem-solving attitude 






Calculus achievement 




ACfllBnmSm MBBSOfSS 


1 




Calculus arhifluflmiint 


ividincmaiiC9 proDi6in*suiviny 








Mathematics attitude 


Computation-comprehension 


SAT mathematics 


Mathematics problem-solving 


Application-analysis 


Mathematics problem-solving 


Mathematics attitude 




Application-analysis 


Mathematics problem-solving 


Mathematics problem-solving 


Mathematics problem-solving 


Mathematics problem-solving 




Calculus achievement 






SAT mathematics 




tiBUfistic time score mssm 






Time: Excluding looking back 


Time: Excluding looking back 


None 


Time: Looking back 


Ripon mathematics placement 


Time: Looking back 






Ripon mathematics placement' 


Time: Total 


Time: Total 


None 
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Table 5 

Analysis of Covariance for Observation 2 Achievement and Attitude Measures 

Observed means Adjusted means 
Outcome Measure^ Heuristic Expository Heuristic Expository df 



Instructional 



Calculus Achievement Test» Form A' (96) 22.7 
Computation-Comprehension Scale 

(52) 17.6 

Application -Analysis Scale (44) 5.1 
Mathematics Problem-Solving Test. Form. 

B (52) 12.5 

Mathematics Attitude Scale (80) 55.5 

Problem-Solving Attitude Scale (64) 41.7 



^Numbers in parentheses indicate maximum possible score. 



34.5 26.2 31.0 1/23 

23.1 20.1 20.6 1/23 
11.5 6,9 9.7 1/22 

16.9 14,3 15.1 1/21 

62.2 58,4 59,3 1/23 
43,9 42,8 42.8 1/22 
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side, mosi students were concerned about covering enough material and were 
bothered by not having a textbook. A few students thought that ihc class was 
less stimulating than others and decreased theih interest in mathematics. On 
the positive side, most students enjoyed doing problems every day, thought th;u 
the teacher was effective in giving hints, and thought that the class was more 
stimulating than others. Several said thai the class increased their interest in 
tnathcniatics. 

Prohlnn-solrmjif hehaviors. Table 6 contains the X ^ statistics and sig- 
nificance levels for the 56 heuristic variables that were dichotomized foi^ logit 
analysis of their frequency (or score) distributions. The logit model analyzes 
the data for the positest, taking into account the student's performance on the 
pretest at Observation 1. Its function with respect to qualitative data is analo- 
gous to the function of analysis of covariance with respect to quantitative data. 
No results are reported for one variable because a preliminary analysis indi- 
cated an interaction between instructional method and pretest response level, 
making interpretation of results about main effects due to instructional method 
doubtful. 

The only variable for which the X ^ statistic indicated a main effect is 
Rereads Problem. The contingency table lor the variable revealed that the 
small group-heuristic method produced a greater tendency to rei ead parts of a 
problem than the expository method. 

Tabic 7 shows the results of analysis of covariance of three measures of 
time (in 15-sccond units) taken during the problem-solving interviews. The 
statistics indicated one significant difference: The small group-heuristic class 
.spent inore time looking back at the problem and solution after obtaining a 
result. 

Table 8 shows the results of analysis of the data relating to criteria 7- 
10, which concern retention. 

Retention oj calculus achievement. At Observation 3, one month after 
the instructional period, the adjusted means on Form B* of the Calculus 
Achievement Test favored the small group-heuristic class, a reversal of the 
result at Observation 2. The difference, however, did not approach 
significance. 

Retention of calculus achievement at the computation-comprehension 
coj^rrtitiue level. The adjusted means on tjbe Computation -Com prehension 
Scale of Form B* of the Calculus Achievemen: Test showed another reversal 
from Observation 2, with data at Observation 3 favoring the small group- 
hcurisiic class. The difference, again, did not approach significance. 

Rt'/t ri/ion oJ k uI ruins achicvenieut at the application-analysis cognitive 
It'vt'l. The adjusted mearis on the Application-Analysis Scaleof Form B* of the 
Calculus Achievement Test showed yet another re/ersal from Observation 2, 
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Table 6 



Logit Analysis for Dichotomized heuristic Variables at Observation 2 



■•— 

Variable 




Variable 





B3$tales problem 


2.18 


Comparison with known result 


o.oo 


Mnemonic notation 


1.24 


Cond3nses/outlines process 


0.00 


Representative diagram- -yes 


1.11 


Trieste derive differently 


tt.63 


Representative diagram—no 


1.11 


Variation by analogy 


0.00 


Auxiliary lines 


1,20 


Va.';ation by changing conditions 


0.00 


Isolates focal points 


0.94 


Algebraic manip';lation error 


1.92 


Recalls related problem 


0.49 


Numerical computation error 


0.06 


Uses method of related problem 


0.00 


Differentiation error 


0.00 


Uses result of related problem 


0.75 


Other executive error 


0.69 


Inductive reasoning 


0.18 


Misinterprets data 


4.36 


Routine check of manipulations 


0.14 


Misinterprets question 


0.59 


Is result reasonable? 


0.45 


Other structural error 


3.12 


All information used? 


0.00 


Score: approach 


1.76 


Test for symmetry 


0.00 


Score; plan 


1.45 


Test of dimensions 


0.18 


Random trial and error 


0.00 


Specialization 


0.59 


Systematic trial and error 


0.62 


Score; result 


2.53 


Reasoning by analogy 


0.00 


Score: total 


2,73 


Nc'ciassifiable 


2.53 


noaQS proDiem 
Rereads problem 


nnn 
4.99* 


Varies theproce^ 


0 19 
0.6? 


Separates/summarizes data 


1.87 
_a 


Varies the problem 


0.00 


Draws diagram 


30-second hesitation 


0.85 


Modifies diagram 


0.36 


Stops without solution 


4.40 


Draws diagram with coordinate system 


0.46 


Structural error 


2.02 


Model by means of equation 


0.33 


Executive error 


0.62 


Algorithmic process 


2.24 


Corrects error 


3.27 


Exploratory work with datii 


0.00 
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Deduction by synthesis 


3.12 




Deduction by analysis 


4.29 







*lERiO0t reported because preliminary analysis indicated an interaction between insnctional method and pretest response level. 



Table? 

Analysis of Covariance for Observation 2 Heuristic Time Score Variables 



Observed means Adjusted means 



Variable 



Time; Excluding looking back 
Time: Looking back 
Time: Total 



Heuristic 


Expository 


Heuristic 


Expository 


df 


F 


1507 


137.8 


153.9 


134.8 


1/22 


0.78 


7.8 


1.2 


8.7 


0.4 


1/22 


7.42* 


158.4 


139.1 


160.5 


137.1 


1/22 


1.11 



•p< .025. 



Tables 

Analysis of Covariance for Observation 3 Achievement Measures 



Observed means Adjusted means 



Instructional outcome measure® Heuristic Expository Heuristic Expository df 



Calculus Actiievement Test, Form B* (84) 


11.2 


17.5 


16.3 


12.4 




0.48 


Computation-Comprehension Scale 








8.9 


1/23 


0.30 


(44) 


8.5 


11.1 


10.7 


Application-Analysis Scale (40) 


2.7 


6.4 


4.8 


4.2 


1/22 


0.04 


Mathematical Problem-Solving Test, Form 










1/23 


6.62* 


C(52) 


12.5 


27.2 


15.3 


24.3 



%mbers in parentti3ses indicate maximum possible score. 
*p < .025. 
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favoring ihc small group-heuristic class at Observation 3. The difference, how- 
ever, d!(l not appronch significance. 

Retention of mathematical problem-solving achievement. The analysis 
of scores on Form C of the Mathematical Problem Solving Test showed a 
dramatic, but curious, shift in problem-solving achievement during the vaca- 
tion period between Observations 2 and 3. While at Observation 2 the differ- 
ence between adjusted means did not approach significance, at Observation 3 
the difference was highly significant (p .025), favoring the expository 
class. 

Retention of problem-solving behaviors. Table 9 contains the sta- 
tistics and significance levels for the 56 dichotomized heuristic variables at Ob- 
servation 3. The only significant difference gives independent confirmation to 
the shift in problem-solving achievement detected by the Mathematical Prob- 
lem Solving Test. On both the total score awarded for solution of the problems 
and on the subscorc awarded for correctness of the results, the expository class 
was favored. The difference indicated at Observation 2 on the van^b!^ Rereads 
Problem did not persist until Observation 3. 

Table 1 0 shows no significant differences on the three measures of time 
taken during the problem-solving interviews at Observation 3. The r-statistic 
for Time: Looking Back, which was significant at Observation 2, falls just 
short of the critical value of 2.96 for significance (/? <: .10) at Observation 3. 

Rate oj coverage oj material in each method. Like most expository- 
discovery studies, this one found di.scovery learning to be slower. Of 34 topics 
scheduled to be covered in Calculus I, the .small group-heuristic class failed to 
cover six; the expository class covered not only all the topics scheduled but also 
four optional topics. 

Analysis of the Evaluation Instruments 

Reliability coefficients for the instruments used in this study are in 1 a- 
ble 11. The investigator chose 0.90 as an acceptable level for reliability coeffi- 
cients on the achievement measures and 0.80 as an acceptable level on the 
attitude and teacher behavior measures. The reliability coefficients for the Ex- 
rv>sitory-Hcuristic Scale of the Measure of Teacher Fidelity to the Model, the 
Mathematics Attitude Scale, and the Problem Solving Attitude Scale were all 
acceptable. Form A of the Mathematical Problem Solving Test has a reliabil- 
ity coefficient of 0.82, which approaches acceptability. The remaining instru- 
ments — Coinciding Characteristics Scale, Small Group Operation Scale, 
Forms B and C of the Mathematical Problem Solving Test, and P'orms A* and 
B* of the Calculus Achievement Test — had unacceptable reliability coeffi- 
cients. Before these instruments arc used in a large-scale experimental study, 
their reliabilities must be improved. Suggestions for adding or improving 
items, along with item analy.scs and other information regarding the validity of 
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Table 9 



Logil Analysis lor Dichotomized Heuristic Variables at Observation 3 



Variable 



Restates problem 
Mnemc/iic notation 
Representative diagram-yes 
Representative diagram-no 
Auxiliary lines 
Isolates focal points 
Recalls related problem 
Uses method of related problem 
Uses result of related problem 
Inductive reasoning 
Routine clieck of manipulations 
Is result reasonable? 
All information used? 
Test lor symmetry 
Test of dimensions 
Specialization 
Score: result 
Score: total 
Reads problem 
Rereads problem 
Separates/summarizes data 
Draws diagram 
Modifies diagram 

Draws diagram with coordinate system 
Model by means of equation 
Algorithmic process 
Exploratory work with data 
Deduction by synthesis 
Deduction by analysis 



X'' 


Variable 


2.61 


Comparison with known result 


0.40 


Condenses/outlines process 


203 

b*VV 


Tries to derive differently 


2.03 


Variation by analogy 


2.30 


Variation by changing conditions 


0.25 


Algebraic manipulation error 


0.00 


Numerical computation error 


0.00 


Differentiation error 


0.18 


Other executive error 


2.20 


Misinterprets data 


3.5§ 


Misinterprets question 
Other structural error 


0.00 


Score: approach 


0.00 


Score: plan 


0.75 


Random trial and error 


0.19 


Systematic trial and error 


5.23* 


Reasoning by analogy 


5.05* 


Not classifiable 


0.00 


Checks the result 


0.85 


Varies the process 


1.53 


Varies the problem 


2.98 


30-second hesitation 


1.12 


Stops without solution 


1.56 


Structural error 


0.48 


Executive error 


1.97 


Corrects error 


0.00 




1.73 




1.51 





0.00 
0.00 
3,13 



0.00 
0.40 
0.35 
0.49 
2.25 
4.42 
0.41 



2.06 
4.56 
0.00 
0.67 
0.00 



1.44 
3.13 
0.00 
1.03 
1.52 
2.97 
0.06 
1.41 



%sults not reported because preliminary analysis indicated an interaction between instructional method and pretest response level. 
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Table 10 

Analysis of Covariance lor Observation 3 Heuristic Time Score Variables 

Observed means Adjusted means 

Variable Heuristic Expository Heuristic Expository ( 



Time: Excluding Mg back 
Time: Looking bacl( 
Time: Total 



164.4 


176.6 


164.4 


8.9 


2.0 


7.6 


173.3 


178.6 


173.3 



176.6 


1/23 


0.17 


3.3 


1/21 


2.82 


17t..6 


1/23 


0.03 
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Table 11 



Reliabiy Coefficients for Inslruments 
Used in the Pilot Study 



Instrument 



Measure of Teacher Fidelity to Model 

Expository-Heuristic Scale 

Coinciding Characteristics Scale 

Small Group Operation Scale 
Mathematics Attitude Scale 
Problem-Solving Attitude Scale 
Mathematical Problem-Solving Test, Form A 
Mathematical Problem-Solving Test, Form B 
Mathematical Problem-Solving Test, Form C 
Calculus Achievement Test, Form A* 

Computation-Comprehension Scale 

Application-Analysis Scale 
Calculus Achievement Test, Form B* 

Computation-Comprehension Scale 

Application-Analysis Scale 
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Reliability 
coellicients 



Hoyt Test-Retest 



0.97 


0.98 


0.55 


0.64 


0.52 


0.77 


0.95 


0.94 


0.89 




0.82 




0.60 




0.64 
0.74 




0.65 




0.54 




0.75 





the instruments, may be found in the original report of this study (Loomer, 
1976, pp, 200-211,247.250). 

Discussion 

This study is an exploratory probe of the differential effects of the small 
group-heuristic and expository methods of teaching calculus. The nature of 
the study was to explore the experimental method and to sharpen hypotheses, 
not to reach general conclusions beyond the particular classes and teacher that 
participated. 

The appropriate warnings having been issued, it is possible to make 
some observations and conjectures. The clearest evidence of the study is that 
the leaching methods used in the two calculus classes were faithful to their 
respective models. However, there were few measurable differences in instruc- 
tional outcomes. Immediately after instruction there were no statistically sig- 
niGcant differences between the two classes on any of the attitude or achieve- 
ment measures. Analysis of 59 heuristic variables produced only two 
significant differences: The small group-heuristic class reread parts of a prob- 
lem more frequently and spent more time looking back at the problem and 
solution. 

One month after instruction the expository class showed a surprising 
superiority in problem-solving achievement, although it had not been im- 
mersed in a problem-solving environment during the instructional phase as 
had the small group-heuristic class. The statistical analyses detected no other 
significant differences between the two classes on calculus achievement mea- 
sures or other heuristic variables. There was faint evidence of a reversal on all 
three calculus achievement measures over the college vacation. The adjusted 
means^ which all favored the expository class at Observation 2, all favored the 
small group-heuristic class at Observation 3. None of the differences even ap- 
proached significance, however. 

The lack of differences in instructional outcomes despite clear evidence 
of differences in teaching methods is puzzling. Small sample sizes, low re- 
liabilities of some of the evaluation instruments, or the nonequivalent:<: of the 
classes prior to instruction may have decreased the precision of the statistical 
tests. A more carefully controlled, large-scale study using improved instru- 
ments would have a better chance of detecting differences. 

Thr rcsuhs suggest that the * :nall group-heuristic method was much 
less rflcctivc in producing changes in problem-solving behaviors than Lucases 
inquiry method, which emphasized instruction in heuristics (see Chapter 4). 
The key to the difference in the results rf the two studies may be the quantity 
of instruction in heuristics. Lucas was able, as Polya suggests, to make contin- 
ual use of heuristic questions and suggestions in the classroom. The present 
in . cstigator was able to make use of heuristic questions and suggestions less 
frequently — only when a group asked for his help in solving a problem. 
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Furthermore, he was unable to make a systematic presentation ol heuristic 
suggestions; the heuristic strategy discussed at a particular moment was the 
one needed by the group at that time. 

Two of the instruments developed for this study appear worthy of fur- 
ther study and evaluation. The success of the Measure of Teacher Fideiity to 
the VI odd in defecting difTerenres in the expository-heuristic characteristics of 
the teacher's behavior indicates the possibility of designing other such instru- 
ments. The Mathematical Problem Solving Test may be a good instrument for 
predicting achievement in calculus. 

Further exploration of the small group-heuristic method in a large- 
scale experimental study now seems in order. This exploratory study has laid 
the groundwork: It has selected criteria and developed instruments for evalu- 
ating the method and mea.suring instructional outcomes, developed models of 
ic'^ching behavior and an instrument for measuring fidelity to the models, and 
generated and sharpened hypotheses about the effects of the method. 
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Chapter 7 



A Study of Problem-solving 
Performance Measures 

Donald L. Zalewski 

Purpose 

One of the primary goals of school maihemaiics programs is to develop 
problem-solving 'abilities. Helping siudenis develop these abilities and assess- 
ing their problem-solving performance are joint concerns of curriculum, in- 
struction, and research. At present the most appropriate way to assess stu- 
dents' problem-solving performance seems to be through the use of personal 
interviews and the thinking aloud procedure (Kilpairick, 1967; Loomer, 
1976; Lucas, 1972). However, these techniques are very lime consuming and 
cannot easily be employed by school mathematics teachers. The study reported 
in this chapter involves the development and testing of a paper-and-pencil 
instrument intended to predict a student's Wei of problem-solving 
performance. 

Definitions 

The content of any problem-solving study depends on its interpretation 
of the term "problem." In this study, a rnnlhcwatical problem is one which 
meets three conditions: (a) the statement presents information and an objec- 
tive or question whose answer is based on that information; (b) the objective 
or answer to the question can be found by translating the information into 
mathematical terms or by applying results from mathematics; (c) the individ- 
ual attempting to answer the question or attain the objective does not possess 
an immediate answer, procedure, or algorithm which solves the problem. If an 
individual solved a given problem or one similar to it previously and simply 
recalls the answer or procedure, the situation would not be considered a prob- 
lem for that person. Mathematical problem'Solving is the process of develop- 
ing and using a procedure to solve a mathematical problem. The process in- 
volved may require a search among possible strategies, the use of various rules 
and techniques, and prior knowledge of mathematics. 

Background 

Commercial Instruments 

While developing test items for a state mathematics assessment pro- 
grain, the investigator realized that very few methods exist to record and assess 
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the maihcmaiical problem-solving achievemeni of students. During the initial 
part of this study, the investigator found a few procedures which claim to 
measure problem-solving achievement, but an exanjr.ation of these proce- 
dures raised doubts about their validity. 

Commercial tests include the mathematical problem-solving measures 
which are most available for school use. However, as the problem-solving sub- 
tests were examined, several inadequacies in the items and scoring procedures 
were detected. 

The Iowa Test of Basic Skills (Lindquist & Hieronymus, 1964), 
Form 2, is identified as a problem-solving assessment instrument. However, 
the items do not satisfy the definition of a mathematical problem used in this 
study because direct algorithmic processes are suggested by words such as "to- 
tal" and "difference." "Mathematical problem solving" is one of the xests in 
the Metropolitan Achievement Test (Durost, Bixler, Wrighlstone, Prescott 
& Balow, 1970) batteries, but the items are simple verbal situations. They 
require only one obvious operation suggested by questions such as "How 
much more. . . ?," "How many limes as many . . .?," or "What is the area 
of. . . ?" In items calling for two operations and more complex solving be- 
haviors, the students need only select the appropriate sentence from four 
choices (the fourth being "more information needed") without actually solv- 
ing the problem. 

The Instructional Objectives Exchange (lOX, 1970) identifies a ma- 
jor category, "Application — Problem Solving." The questions give attention 
to both process and solution, but the sample objectives emphasize the answers 
to the items and a student is rated only on the number correct. 

The California Achievement test battery (Tiegs & Clark, 1970) uses 
a "Problems" test which allows 13 minutes to solve 15 written items about 
money, averages, area, volume, and percents. Eight of the problems require 
only one operation and seven items require two operations. Scoring is based 
only on the number of correct responses. 

All the commercial tests the investigator examined give a choice of an- 
swers (usually four or five) for each item and score a student according to the 
number of correct choices. Though this practice permits rapid scoring, it does 
not create a genuine problem-solving situation. 

The validity of the commercial tests as problem-solving measures be- 
came even more questionable as the investigator examined their validation' 
procedures. A search of both the technical and teacher's manuals of the Iowa 
Test of Basic Skills (ITBS) battery (Lindquist & Hieronymus, 1964) failed 
to uncover any validation procedures for their "problem-solving" test (A-2). 
The writers' statement, "The most valid achievement test for your school is 
that which in itself defines most adequately your objectives of instrijction/' 
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seems to summarize their altitude toward test validity, especially in the area of 
mathematical problem solving. 

The Metropolitan Achievement Test manual (Durost, Bixler, 
Wrightstone, Prescott, & Balow, 1971a, 1971b) discusses test validity, but 
fails to provide a definition of mathematical problems or any interpretation of 
lest results in terms of problem-solving skills. The writers state that the con- 
tent validity of this test was established by examining textbooks, study guides, 
and mathematics curriculum recommendations. The teacher's handbook offers 
advice simdar to that of ITBS, **Since each school has its own curriculum^ the 
content validity of Metropolitan Achievement Tests must be evaluated by each 
school.*' (p. 32) Construct validity is concerned with ''the completeness of the 
lest as a well rounded or representative sample of the content we are hoping to 
measure, and also the appropriateness of the types used." (p. 32) The test 
writers 4)elieve that con^ui , cnt validity and predictive validity have little or no 
meaning as applied to specific tests within achievement batteries and no valid- 
ity measurements are offered. ' 

The content validity of the Caifornia Achievement test battery is dis- 
cussed very briefly; it was based on widely accepted mathematics curriculum 
objectives in the United States. 

The examination of commercial tests as mathematical problem-solving 
measures revealed several reasons to doubt their validity: (a) the ''problems" 
were usually simple written items (.often referred to as "word" or '*story" 
problems) which did not meet this study*s definition of mathematical 
problems; (b) the scoring only focused on the correct response without consid- 
ering the processes used; (c) the tests set time limits which gave students little 
opportunity to practice problem-solving techniques; and'(d) the test writers 
provided no validity measures except the usual content validity statements. 
Thus, the commercial tests were judged not to be valid mathematical problem- 
solving measures and other procedures were examined. 

Research Procedures 

Research in problem solving has been hampered by semantic ambigui- 
ties, overgeneralizations, and 'lack of consolidation of efforts; however, some 
helpful directions and procedures have resulted. It is generally agreed that the 
products of problem solving — responses, results, or completed methods — do 
not permit sound inferences about the processes used and that it is necessary to 
study subjects' ob.servable behaviors to better analyze problem-solving prac- 
tices. Several procedures of varying utility and validity have been devised to 
generate and record an observable sequence of behavior. Bourne and Battig 
( 1966) described a sample of frequently employed methods and commented 
on their limitations. For example, manipulative devices such as pendulum 
problems (Maier, 1931) or jars of water (Luchins & Luchins, 1950) only 
revealed a few of the hypotheses or hunches a subject was entertaining at a 




given momenl. The limilations of aliempling lo infer process from exiernal 
actions made the direct exploration of mental processes a desirable alternative. 

The direct investigation of problem-solving processes requires subjects 
to verbalize during or after the solution search. Introspection requires a sub- 
ject to solve problems and report on thoughts, reactions, and feelings while 
performing. Though introspection externalizes thought patterns, there are se- 
rious questions about the distortion and interference introduced by the experi- 
mental procedure. Reirosfwction requires the subject to give a narrative ac- 
count of his or her thoughts and processes after having completed the problem- 
solving task. Broder and Bloom ( 1950) found that when this procedure was 
used to osbserve problem-solving tasks some steps were forgotten and re- 
arrangement of the remaining steps in a more logical order resulted. In addi- 
tion to these internal deficiencies, the two verbalization techniques are expen- 
sive in time and equipment, and require careful training of both subjects and 
observers. 

One method that avoids some of these difficulties is the thinking aloud 
technique in which the subject simply verbalizes (without analyzing) 
thoughts while working, and these statements are recorded. The thinking 
aloud method has been criticized and questioned, but evidence concerning 
whether speech and thinking complement or interfere with each other has 
been inconclusive. Kilpatrick (1967) was willing to risk these possible dan- 
gers in return for the helpful information that can be gained. 

The method of thinking aloud has the special virtues of being both pro- 
ductive and easy to use. If the subject understands what is wanted— that 
he is not only to solve the problem but also to tell how he goes about 
finding a solution —and if the method is used with the awareness of its 
limitations, then one can obtain detailed information about thought 
processes, (p. B) 

The increasing recognition and use of the thinking aloud procedure in 
research studies provided sufficient reason to assume that the procedure was 
valid for identifying problem-solving behaviors. The patterns and processes 
revealed by subjects' responses during taped pilot study interviews added to 
the investigator's confidence that the thinking aloud procedure reflects genuine 
problem-solving behaviors. 

A complication of the thinking aloud procedure js that the recorded 
verbal data has to be analyzed and classified. During his investigation of eighth 
graders' problem-solving performance, Kilpatrick (1967) devised a general 
guide for coding audiotaped protocols of students thinking aloud while solving 
mathematical problems. Subsequently, he devised a comprehensive system 
which included a checklist and a model for coding the chain of behaviors oc- 
curring in a subject's protocol. 
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Lucas (1972) extended Kilpalrick's classification system in a study 
involving liruristic problem-solving strategics in calculus. He altered the 
checklist, added symbols, made numerous revisions in the process coding sys- 
tem, and developed a scoring system based on performance within a problem. 
Although Lucas used his revised coding scheme to detect ehc^nges in heuristic 
solving strategies in calculus, the form is easily adaptable: to any study involv- 
ing mathematical problem solving in thinking aloud interviews. 

Goals of This Study 

The investigation of research procedures reported here found a method 
that was assumed to be valid and reliable for recording and assessing mathe- 
matical problem-solving behaviors. The 6rst goal (S this study was to record, 
assess, and rank the mathematical problem-solving performances of seventh- 
grade students using the thinking aloud procedure and Lucas's refined coding 
.sysiem. Two questions relat>-;d to this goal were con.sidered: 

1. How well does the thinking aloud procedure and coding scheme 
capture and classify the mathematical pr()blem-;>olving behaviors of seventh- 
grade students? 

2. is it possible to separate and rank seventh-grade students according 
to their coded problem-solving protocols? 

Analysis and evaluation of the coded data from thinking aloud sessions ' 
was assumed to be a valid method of classifying students' mathematical prob- 
lem-solving performances. However, the method is not readily used in schools 
because of its physical limitations: Only one subject can be tested at a time; 
considerable tii'ne and expense are involved in recording,. coding, and evaluat- 
ing each performance; and specially trained interviewers and coders are 
needed. These factors would make a large scale school assessment financially 
impractical, if not impossible. For an individual teacher, the lack of interview 
and coding skills could be a handicap, and finding additional time for inter- 
views in an already crowded schedule makes the time required for uie thinking 
aloud procedure a deterrent for classroom use. 

A practical alternative for measuring problem-solving performance is 
a paper-and-peneil instrument, as it requires only simple materials and need 
not be administered by specially trained personnel. The second goal of this 
study was to investigate the feasibility of producing a written instrument that 
reflects the mathematical problem-solving ability of seventh graders. Specifi- 
cally, the question being asked was, "Is it feasible to construct a written evalu- 
ative instrument whose results correlate well with the ranking derived from 
the coded protocols?" 
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Study Design 

This study had three prineipal parts. First, the problem-solving per- 
formances of students observed thinking aloud were recorded, analyzed, and 
ranked. Second, a written test (WT) was devised and administered to the 
same students to provide a second ranking. Third, the correlation between the 
two ranks was determined. Details of each part of the sfidy will be discussed, 
separately. 

Part I: The Complex Problem-solving Assessment Procedure 

In planning to use the complex interview and coding procedure, the 
mathematical problems, subjects, interviews, coding system, and ranking pro- 
cedures all received careful scrutiny. These considerations will be described in 
turn. 

The malhematical problems. Six interview test (IT) items were drawn 
from a pool developed by the investigator using the definition of "mathcmaticia 
problem" given earlier. The judgments of mathematics educators, the results 
of a pilot study, and an examination of the mathematics curriculum provided 
in textbooks were used to screen items and strengthen content validi'y. To 
prevent computational diCSculty from being an important factor, the arithme- 
tic in the problems was kept simple. 

The subjects. The seventh-grade level was chosen for this study. A sin- 
gle grade was chosen to restrict the scope of the study and to extend the work 
started by the investigator during the state assessment of seventh-grade 
students. 

The interviews. Seventh-grade students solved six mathematical 
problems in a thinking aloud taped interview. The interview procedures, de- 
veloped by Kilpatrick (1967) and Lucas ( 1972), are detailed in the authors 
dissertation (Zalcwski, 1974). 

The coding sy.stem. The coding system for this study was a combina- 
tion of Kilpatrick's (1967) and Lucas's (1972). Lucas developed a five point 
scoring system based on a subject's complete protocol for a problem. He totaled 
the points for Approach (Oor l),Plan (0, 1, or 2), and Result (0, l,or2). 
Lucas's scoring procedure was followed as the IT ranking of students was 
developed. Lucas's system is a modification of Kilpatrick's, but since Lucas 
used his system to code the behavior of calculus students, some symbols and 
items were eliminated. Other revisions were made according to the results of a 
pilot study. 

The ranking. Two measures were applied after coding and scoring the 
subjects' protocols. The number of correct answers was the simplest measure 
while the total process score (or any of the subscores) provided a second basis 
for ranking subjects. Both statistics were considered in determining the stu- 
dents' IT rankings. 
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A ihird basis lor ranking interview suhjetis was a siaiisiical anidysis oC 
their ro(|r'<l [>r()UK(»ls. Lau-ni partitioning (Lord & Novick, 1968; Torgtrson, 
1058) or a type ol tluster^iig analysis (Hubert, 1973) was applied. Based on 
patterns of the ( oded behaviors, these statistical procedures provided a separa- 
tion of the subjects into subgroups. Then the investigator determined an order- 
ing between and within subgroups to provide yet another ranking of the IT 
subjects. 

Part II: The Written Test 

In this part of the study a paper-and-pencii instrument (WT) was 
devised to provide a second ranking of the subjects who participated in the Part 
I interviews. Subjects took the WT and were ranked according to the results. 
The correlation between the rankings of Parts I and II were established statis- 
tically in Part 3. A high correlation would provide the concurrent validity 
needed to suggest that substituting a written test for the complex interview and 
coding procedure is feasible. 

The WT Items. \\ was desired that the WT be related to mathematical 
problem solving. Thus, the items chosen for the paper-and-pencil instrument 
were mathematical in nature, nonroutine, and open-ended. Manipulations, 
symbols, number size, and number of icps >vere kept within the ability of 
seventh graders. 

The WT items were not the same as the mathematical problems used 
in the IT. Some items in the WT require only one-step solutions and did not 
meet the criteria of mathematical problems, but all attempted to avoid simple 
recall of knowledge. An item pool was created in accord with these criteria; it 
was randomly sampled in generating the WT. For convenient school use, the 
WT was constructed so that it can be administered to students in one 50- 
minute class period. 

The WT ranking. On the WT, the subjects were ranked solely on the 
basis of correct responses. The answer to a problem alone does not reveal the 
solution processes involved, but the WT was not designed to measure 
pr(>ces:;es. Its only purpose was to provide a second ranking of the same stu- 
dents who were given the IT. 

Part III: The Comparison of Ranks 

The third part of this study was designed to test similarities between 
the rankings developed in Parts I and II. 

Corrr/fi/m/ts. After th<r rankings from Parts I and II were established. 
Spearman's r;ink order rorrchuion (orlficicnt and Kendall's T ( 1955) were 
computed. A ( orrelation of at least .7 1 would indicate the WT scores account 
for approximately 50 7o of the variance in the IT ranks and establish concur- 
rent validity of the WT. This was determined to be the minimum correlation 



to support the feasibility of using the WT as a substitute for the thinking aloud 
and coding procedure. 

The Studies 

Prior to the main study, a pilol study was conducted; that study re- 
sulted in important changes in the main study. Thus, both studies are reported 
here. 

PUot Study 

The purpose of the pilot study was to iryout the interview procedures 
and their coding and scoring schemes and to use an initial version of the WT. 
The pilol study results suggested changes in the original plan for the study and 
modi5catians were made in the taping format, ihe WT length, the interview 
procedures, and the checklist and coding scheme. 

Audiolaping versus videotaping. During the summer of 1973, eight 
volunteers who had completed seventh grade in Maclison, Wisconsin, took 
both the WT and the IT. After audiotaping the verbalizations of the Brst sub- 
ject, it wras apparent that interesting physical actions and silent indications of 
problem-solving processes were not being captured. For example, a subject 
moved his pencil across the page as he silently reread; the audiotape recorded 
only silence while this significant behavior occurred. The investigator decided 
to use videotaping with four pilot subjects to explore the advantages of a visual 
and audio record of the interviews. Later the use of videotape was incorpo- 
rated into the main study. 

During the pilot study, it seemed that pilot subjects who were video- 
taped behaved differently than if they had been audiotaped. Thus a question 
arose: Do subjects perform differently if they are videotaped instead of being 
audiotaped? To answer this, two measures of difference based on problem- 
solving interview scores were compared through a one-way fixed effects analy- 
sis of variance (ANOVA) with the subjects randomly assigned to treatment 
groups (audiotaping or videotaping). The following hypothesis was posed: 

Hypothesis HI: The mean score on achievement for videotaped subjects 
equals the mean score on achievement for audiotaped subjects. 

An arbitrary significance level of .05 was chosen for rejec:ion of this null 
hypothesis. 

A second one-way fixed effects ANOVA was applied to the total time 
each subject used to solve the six mathematical problems given during the 
interviews. A second hypothesis with a .05 rejection level was posed: 

Hypothesis H2: The mean solution time of the videotaped subjects 
equals the mean solution time of the audiotaped subjects. 
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The incorfx^raiion of videotaping inio ihc study ttvokcd one issue which 
was not directly related to the data. Lucas (personal communication) coded 
the protocols obtained during the pilot tryout in this study and observed that it 
took noticeably less time to code videotaped protocols than audiotaped proto- 
cols. To explore this difference systematically, each taping procedure was con- 
sidered a treatment, and subject.s were randomly assigned to permit an 
ANOVA. The hypothesis tested was the following: 

Hypothesis H3: The mean coding time for audiotaped protocols equals 
the mean coding time for videotaped protocols. 

An arbitrary signiGcance level of .10 was chosen lor rejection of this 
hypothesis. 

Because a statistically significant difference in coding times may not be 
important in practice, a second method of comparing coding times was 
planned. The difference between the average coding time for 1 minute of 
audiotape and the average coding time for 1 rr-nute of videotape would be 
found; if the difference between the averages was greater'than 10%, that dif- 
ference would be regarded as significant. 

Changes in the WT. The original written test contained 1 6 items. For 
this test, Hoyt's internal consistency measure produced a reliability of only 
0.1765. This extreme!) low reliability could have been due to the small 
number of subjects in the pilot study, an unusual interaction of subjects and 
items, or the number of items on the test. It was assume that the first two 
possibilities would be compensated for in the main study by the larger number 
of subjects and the random item sampling procedure. The third possible cause 
of low reliability was counteracted bby increasing the WT from 1 6 to 20 items. 

Changes in ike interview procedures. The pilot study produced two 
changes in the interview strategies. First, the apparent nervousness and haste 
of pilot subjects who were videiaped suggested that extra efforts would have to 
be made to put students at ease before having them think aloud while solving 
problems. Subsequently, the interviewer planned to verbally emphasize that 
the subjects could use as much time as they needed, would converse with each 
subject until the student appeared comfortable, and would not place a clock in 
a conspicuous position. The same precautions were planned for audiotaped 
subjects although the presence of a tape recorder did not seem to have the s^r. it- 
effect as a camera. 

The second change in the interview procedures resulted from following 
Lucas's ( 1972) practice of verbally encouraging a subject to think aloud if the 
.student fell silent for a period of 30 seconds. When one subject was prodded 
with "What arc you doing now?" after a silence of 30 seconds, he appeared 
slightly irritated at having his thoughts interrupted, replied "I'm thinking," 
and liif)scd iKick into silence. Similar reactions by other subjects persuaded the 
invcsiii^ator lo avoid interfering after 30 seconds and to use his discretion if the 
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subject was not talkative or overly active for more than a minute, especially if it 
appeared that the subjea was stymied or frustrated. The interviewer would 
not interrupt a subject if it appeared that he or she was silently devising a plan, 
even though this neglea would cause gaps in the thinking aloud record of a 
student's problem-solving procedures. 

Changes in the coding system and checklist. In addition to the changes 
in the interview procedures, some modifications of Lucas's (1972) coding sys- 
tem were suggested by the pilot study. The subjects in the pilot study never 
produced behaviors to be coded as Mf (introducing diagram with coordinate 
system imposed), Vs (varies the process), or Vm (varies the problem). These 
symbols and the related items on the checklist were eliminated from Lucas's 
(1972) format. Additional symbols were devised to classify behaviors which 
did not fit easily into Lucas's system: Rs (restates the problem in his or her 
own words), Rr (rereads the problem or parts of it), D.X (exploratory work 
with data), TR (irregular trial and error), and Ts (systematic trial and er- 
ror). The changes in the process symbols were accompanied by modifications 
in the items on the checklist. 

Main Study 

The main study was conducted according to the modified plans result- 
ing from the pilot study. The IT, WT, population, and events are described 
below. 

. IT and WT. After creating a pool of 50 representative mathematical 
problems, the investigator randomly selected six items for the IT. The WT 
was created by randomly selecting 20 items from the pool of 165 items de- 
scribed earlier. 

Population. The study was conducted at an elementary, parochial 
school located in west central Madison, Wisconsin. Its 435 first- through 
eighth-grade students came mainly from middle to upper middle class families 
of white color workers and professionals. The mathematics program in grades 
5-8 was partially individualized, and students worked at their own pace. 

Written test administration. The two seventh-grade mathematics 
teachers administered the WT to all 63 seventh graders. Each class had ap- 
proximately 40 minutes to complete the test with extra time allowed for those 
students who needed it. The procedures for administering the WT did not 
directly follow the plans. Originally, half of the subjects would have taken the 
WT after the IT, but school conditions dictated otherwise and all subjects took 
the WT before the IT. 

Another change in plans occurred in the WT item format. Originally, 
the 20 items were to be presented in random order to each student to avoid a 
sequence effect. This arrangement would have required that each of the 63 
tests be typed individually. To permit rapid production of the WT, the origi- 
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nal plan was abandoned and all 20 items were presented in the same order to 
every subjcci. 

After the written tests were completed, the investigator visited the 
classrooms to discuss the WT with the subjects and to seek their cooperation in 
arranging the thinking aloud interviews. All subjects were encouraged to par- 
ticipate whether or not they believed ihey had done well on the WT. Subjects 
were not told their results on the WT. 

Tht' intervievf sample. To heed Kilpatrick's (1967) concern tor the 
pressure placed upon subjects in interview situations, subjects with at least 
average mathematical ability were chosen for the interviews. No recent 
achievement test scores were available to classify students, so before the WT 
the two mathematics teachers were asked to identify students in their classes 
who were at least average in achi. -'ement. Thirty-one average or above aver- 
age subjects were identified; all of these students accepted an invitation to par- 
ticipate in ihe interviews. 

The interview arrangements. The videotaped interviews were sched- 
uled for the last week in February 1974, and the audiotaped interviews were 
scheduled for the next week. Sixteen of the 31 subjects were randomly selected 
to be videotaped. The videotaped interviews were conducted in a mobile unit 
parked beside the school. The 15 audiotaped interviews were conducted in a 
meeting room in the school [)asemcni. 

Data and Analyses 

This section reports the data from each of the three principal parts of 
the study. First the scores, rankings, and statistics for the written test will be 
described. The data from the interview test, the statistical analysis of the rela- 
tionship between rankings, and the results of exploratory statistical proce- 
dures follow. 

The Written Test (WT) 

The purpose of the WT was to produce an initial ranking of the sub- 
jects; they were also to be ranked by their performance on the IT, the mathe- 
matical problem-solving instrument. The data and statistics for the WTand a 
subsequent WT2 are presented before feasibility factors are reported, 

.Siihjrcl response data. A total of 63 seventh-grade students took the 20 
item WT. The descriptive statistics for the WT are presented in Table 1 for 
the 3 1 siibjcris who had been rated as average or above average in mathematics 
achicvemem ((iroui) A) and ihe 32 students rated below average (Group B) 
by their nialliernatics icaclicrs. 
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Table 1 



Mean, Standard Deviation, and Range for the WT: 
Group Ay Group B, and Combined 





Number of 


Mean 


Standard 


Range 




subjects 




Deviation 


(20 itenis) 


Group A 


31 


7.4194 


3.8796 


2 to 14 


Group B 


32 


3.7500 


2.7238 


1 to 12 


Groups A and B 








1 to 14 


combined 


63 


5.5556 


3.7963 



According to Table 1, the results on the WT were consistent with 
teacher ratings. Group A averaged 7.42 correct responses, almost twice the 
3.75 mean of the lower rated Group B. Group A omitted an average of 2.7 
items on the WT while Group B subjects omitted an average of 4.1 items. 

The low mean scores and the high number of items omitted by both 
groups of WT subjects caused the investigator to question whether the mathe- 
matical abilities of the 63 seventh-grade students who participated were repre- 
sentative. In order to compare the subjects to other seventh-grade students, a 
second 20-item written test (WT2) was developed from the available pool 
with the restriction that any item which appeared on the WT could not be 
used on the WT2. In May 1974, 350 seventh-grade students from Madison 
and Des Moines, including the original 63 from Madison, were given the 
WT2. The mean for the 63 Madison subjects on this second test was 6.1 1 ; this 
was close enough to the overall mean of 5.93 to assure the investigator that 
these were typical seventh-grade students and that their low mean scores were 
due to the general difficulty of the items. 

WT length and reliability. The low mean scores of the students did not 
affect the feasibility of the WT, but two other factors, test length and reliabil- 
ity, were also important. A test which took more than an hour to complete or 
which did not attain a reliability of .80 would not meet the expectations of the 
investigator. 

Hoyt reliabilities (Hoyt, 1941) were calculated for both the WT and 
WT2. When the scores of both Group A and Group B were used, the Hoyt 
reliability of the VvT is .82; the reliability of that test is .7968 for Group A 
alone and .73 for Group B alone. 

Using the scores of all 350 students who took the WT2, the Hoyt relia- 
bility of this instrument is .84. The corresponding reliabilities for the WT2 
when Group A (// = 31) only was used and Group B (A^=32) only was used 
were .77 and .68, respectively. No Hoyt reliability was calculated for the WT2 
using only the scores from Groups A and B together. 

The calculated reliabilities demonstrate that, overall, both the WT and 
WT2 exceed the reliability level sought. Using only the scores of Group A, the 
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poienlial IT subjects, the reliabilities of these two tests are close \o the desired 
level of .80, but when Group B alone is considered, the reliabilities oi' both 
WT and WT2 fall short. However, since Group B did not participate in the 
interview phase of this study, the overall reliabilities are satisfactory, and the 
overall reliabilities for Group A are near the desired level, it was feasible to 
compare the results of this test to the problem-solving scores derived from the 
thinking aloud interviews. 

To see if the test was an appropriate length, the investigator recorded 
completion times for 59 of the 63 VVT2 subjects. Mean completion time for 
these subjects was 27 minutes, and the range was from 16 to 37 minutes. The 
27 minute mean indicated that seventh-grade students could respond to the 20 
items in one class period. Even subjects taking 15 minutes more than the mean 
test time would finish the WT in 42 minutes, a completion time less than the 
maximum 50 minute period. 

Written test rankings. The rank of a subject on the WT was based 
solely on the number of correct responses, and only subjects from Group A 
who participated in the IT were ranked. 5ince two written tests, the WT and 
WT2, were administered, rankings were determined for each* and are 
presented in Table 2. 

As can be seen in Table 2, the rankings developed from the WT and 
WT2 are similar. They agree perfectly on subjects 8 (rank 6.5), 16, and 31, 
and agree closely on subjects 2, 10, and 27. Despite the high apparent ranking 
agreement, the investigator decided to compare each WT ranking to the IT 
ranking separately to see which test produced a stronger relationship. 

The Interview Test 

Group A, the students designated as being average or above average 
achievers in mathematics, participated in an interview test (IT) using the 
thinking aloud procedure. Their problem-solving protocols were coded, 
scored, and ranked; these data arc reported next. 

The thinking aloud procedure. During the thinking aloud interviews, 
the investigator observed four behaviors which might raise questions about the 
effectiveness of this procedure. The behaviors were subjects' remarks concern- 
ing their ability to thi.nk aloud, periods of silence, use of retrospection, and 
subject anxiousness. Taole 3 summarizes the occurrences of these behaviors in 
the videotaped and audiotaped interviews. 

As seen in Table 3, two subjects from each taping group made direct 
comments about their ability to think aloud. For example, subject five worked 
calmly but quietly, and after reading an IT problem, explained to the investi- 
gator, gonna figure this out in my mind and tell you when Tm done-^)r 
else I (:an*l get it.*' 
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Table 2 



Rankings of Group A Based on the Results of the 
WT and the WT2 



0UDJ6CT 

number^ 


\A/T niimhAr 

VV 1 IIUIIIUOl 


WT rank^ 


WT2 number 
correct 


WT2 rank^ 


1 


8 


14 


12 


6.5 


2 


9 


11 


• 10 


10.5 


3 


7 


16 


9 


13 


4 


9 


11 


12 


6.5 


5 


9 


11 


7 


18.5 


6 


7 


16 


4 


27 


7 


5 


21 


4 


27 


8 


12 


6.5 


12 


6.5 


9 


6 


18.5 


5 


24 


10 


13 


4 


14 


2.5 


11 


5 


21 


3 


29 


12 


3 


27 


6 


21.5 


13 


3 


27 


2 


30.5 


14 


3 


27 


7 


18.5 


10 


11 


8 


13 


4 


16 


4 


24 


5 


24 


17 


13 


4 


7 


18.5 


18 


9 


11 


8 


15.5 


19 


14 


1.5 


12 


6.5 


20 


4 


24 


11 


9 


21 


9 


11 


9 


13 


22 


7 


16 


9 


13 


23 


2 


30 


5 


24 


24 


2 


30 


2 


30.5 


25 


12 


6.5 


16 


1 


26 


13 


4 


10 


10.5 


27 


14 


1.5 


14 


2.5 


28 


4 


24 


6 


21.5 


29 


5 


21 


4 


27 


30 


2 


30 


8 


15.5 


31 


6 


18.5 


7 


18.5 



® The subject number represents ihe order of his or her appearance in the interviews. 
Subjects 1 to 16 were videotaped and subjects 17 to 31 were audiotaped. 

^In case of ties on number correct, the ranks were averaged. 
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Table 3 



Sndicators of Thinking Aloud Dii^lculties 




During 


During 




videotapirv;-) 


audiotaping 


Number of subjects who made comments on 






their thinking aloud ability 


2 


2 


Number of subjects who explained by 






retrospection 


5 


4 


Number of silent pauses which occurred: 




25 


30 to 60 seconds 


20 


over 60 seconds 


19 


21 


Number of subjects who were juds'^d to be 




6 


anxious 


7 



Reirospeciion was used by subjects who explained iheir pnjccdurcs 
after ihey had achieved an answer. Five videotaped subjects practiced retro- 
sf)C(;iion in a total of 10 instances with one sul)ject resorting to retrospection on 
all five of the problems she solved. Four audiotaped subjects accounted for 
eight instances of retrospection. 

Silent pauses were periods of time when subjects produced no codable 
behavior while attempting to solve a problem. Pauses of less than 30 seconds 
were often used for assimilating information, organizing ideas, or silent reca- 
pitulation and were not considered to indicate thinking aloud difficulty. How- 
ever, pauses longer than 30 seconds usually occurred in protocols of subjects 
who had difficuhies expressing their thoughts aloud. All pauses over 30 
seconds were recorded and dichotomized: pauses less than 1 minute and those 
longer than 1 minute. As indicated by Table 3, silent pauses occurred fre- 
quently in both types of taping. 

The last category in Table 3 records subjects' unspoken reactions while 
participating in the interviews. Four videotaped subjects and three 
audiotaped subjects were clei-iriy nervous. The most common and obvious signs 
included tapping a pencil, scratching parts of the body, or frequent shifting of 
body positions. Three other subjects from each taping procedure exhibited less 
obvious nervous behaviors such as reading the problems rapidly or carelessly 
and sometimes slurring or mispronouncing words. 

I'hr ror/tn^ir system. During the pilot study, the Investigator was fortu- 
nalr Vo receive Lucas's personal assistance in checking the application of his 
system. (Calculating a direct ratio of the frequency of agreement .o the total 
frequency of agreement and disagreement between Lucas and the investigator, 
acceptable agreement measures were computed for the process-sequence cod- 
ing (.72), the checklist (.67), and the scoring system on Approach (.93), Plan 
(.86), and Result (.86). However, the modifications of Lucas's (1972) system 
for this study necessitated additional agreement measures. Three coders, 
among them the investigator, were used to establish those agreements. The 
resulting agreement-disagreement ratios produced an agreement measure of 
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.83 across all variables and inierjuage reiiaoiiiiy icsib piuuuccu a mtaaui*- w» 
.80. 

After agreement ratios and reliability measures were computed and 
evaluated, the coded protocols and scores were used to search for ranking 
schemes. 

The ir ranking schemes. Application of Lucas's (1972) scoring sys- 
tem produced four measures for each problem: Approach (0 or 1 ) , Plan (0, 1 , 
or 2), Result (0, 1, or 2), and Problem Total (0-5). The first ranking scheme 
(Ranking A) was developed by summing the six Problem Totah for each 
subject and assigning a rank of 1 to the highest sum and a rank of 31 to the 
lowest sum. Tied ranks were averaged. The totals and ranks for Ranking A 
are presented in Table 4 as are those for Rankings B and C. 

According to Ranking A, subject 15 had the highest total interview test 
score (24 points) and was ranked first, while subjects 24 and 29 scored no 
points and shared the last averaged rank of 30.5. Other ties occurred at totals 
of 18, 10, 9, 8, 5, 4, and 3 points. Five subjects tied at 9 to share a rank of 14 
(average of 12-16) and five other subjeas tied at 8 to share rank 19 (average 
of 17-21). Except for three subjects tied at 18 points, the remaining ties oc- 
curred in pairs. 

The large number of ties in Ranking A made it likely that this ranking 
would produce a low association with written test ranks. Thus Rankings B 
and C were developed to differentiate between subjeas. Subjeas with tied 
totals earned different numbers of points in subscores of the scoring system, so 
the investigator ranked subjects by their subtotals for Approach, Plan, and 
Result: Aj was equal to the sum of the Approach scores for subject i across the 
six problems; Pj was equal to the sum of the Plan scores; and Rj was equal to 
the sum of the Result scores. Thus, subject j who achieved scores of (1, 1, 0), 
(1,2,2), (0, 0,0), (1,2, 1), (1, 1,1), and (1,1,2) for his Approach, Plan, 
and Results, respectively, attained subscores of Aj = 5, Pj = 7, and Rj = 6. 

Ranking B was based on Aj, Pj, and Rj, but gave priority to subjeas 
who demonstrated an understanding of the most problems. By this system, the 
highest Aj score was ranked first. In case of ties, the subjea with the highest P^ 
scores received the next rank. If subjeas were still tied, then the highest Rj 
received the next rank. If ties existed for all three scores, the ranks were 
averaged. 

Ranking C was similar to Ranking B, but it emphasized the subjea's 
plans and processes. The Pj scores of subjects were used first to determine a 
ranking, and the Aj and Rj scores were compared in that order if ties occurred. 
Table 4 presents the Aj, Pj, and Rj scores, the total scores, and Rankings A, B, 
and C. 
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Table 4 



Interview Test Scores and Rankings A, B, and C 



■ 

Subject 


Approach 
subtotal 
Aj 


Plan 
subtotal 

i 


Result 
subtotal 
Rj 


Total 
interview 
test 
score 


A 


RankinQ 

D 




1 


5 


5 


4 


14 


8 


6 


9 


2 


2 


3 


4 


9 


14^ 


19.53 


19.5a 


3 


2 


3 


3 


8 


19a 


21 


21 


4 


5 


7 


6 


18 


5a 


4.5a 


4.5a 


5 


3 


4 


1 


8 


19a 


15 


12 


6 


1 


1 


1 


3 


28.5^ 


29 


29 


7 


2 


2 


1 


5 


24 5^ 


24.5^ 


24.5a 


8 


2 


3 


4 


9 


143 


19.53 


19.5a 


9 


6 


7 


6 


19 


3 


2 


3 


10 


2 


2 


3 


7 


22 


22 


22 


11 


4 


3 


1 


8 


19^ 


11 


16 


12 


5 


3 


1 


9 


14a 


7 


14 


13 


2 


2 


2 


6 


23 


23 


23 


14 


2 


1 


1. 


4 


26.5^ 


26 


27 


15 


6 


10 


8 


24 


1 


1 


1 


16 


1 


2 


1 


4 


26.5 


28 


26 


17 


3 


4 


3 


10 


10.5^ 


13.5a 


10.5a 


18 


2 


2 


1 


5 


24.5 


24.5a 


24.5a 


19 


4 


7 


7 


18 


5a 


8 


6 


20 


3 


3 


2 


8 


19a 


17 


18 


21 


3 


6 


4 


13 


9 


12 


8 


22 


3 


4 


3 


10 


10.5^ 


13.5a 


10.?a 


23 


3 


3 


3 


9 


14a 


16 


17 


24 


0 


0 


0 


0 


30.53 


30.5a 


30.5 


25 


4 


6 


7 


17 


7 


9 


7 


26 


5 


8 


7 


20 


2 


3 


2 


27 


5 


7 


6 


18 


5a 


4.5a 


4.5a 


28 


2 


1 


0 


3 


28.5^ 


27 


28 


29 


0 


0 


0 


0 


30.53 


30.5a 


30.5a 


30 


2 


4 


2 


8 


19^ 


18 


13 


31 


4 


3 


2 


G 


14a 


10 


15 



Note. Subtotals were a subject's partial scores summed across the six interview 
problems. 

axiws occurred. 
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As can be seen in Table 4, Rankings A, B, and C agree on the ranks 
assigned lo subjects 7, 10, 13, 15, 18, 14. and 29 and are similar in ilie other 
ranks. Since four pairs of subjects had identical subscores. Rankings B and C- 
each produced four pairs of ties, and any other ranking system based on order- 
ing Aj, Pj. and Rj would have had similar results. 

Audio- w. videotaping. The physical differences between audio- and 
videotaping are immediately apparent. Instead of a single tape recorder which 
the observer can operate alone, videotaping requires at least one camera, spe- 
cial lighting, and a technical assistant. To effectively capture a subject's actions 
and writing, more than one prefocused camera or a single camera which can 
be refocused is needed. Compared to audiotaping, the equipment and technical 
assistance necessary for videotaping is more costly to the investigator and per- 
haps more distracting to the subject. 

In this study, the disadvantages of videotaping were offset by the vari- 
ety of information which could be captured. Physical actions, nervous habits, 
and unspoken problem-solving procedures were noted on the videotape. For 
example, subjects reread the problem or parts of it silently, but the video 
record clearly indicated their behavior as they followed the sentences with 
their eyes or pencil, moved their lips, or asked a question immediately after 
staring at a problem. The 16 videotaped subjects produced 95 of these silent 
rereading behaviors, which would not have been evident on audiotape. 

Another problem-solving strategy easily missed on audiotape occurred 
when subjects drew or modified a diagram without verbally indicating their 
actions. Problem 5 on the IT was solved by five audiotaped subjects with a 
sketch of a ladder, but the coder had to rely on completed diagrams and the 
subjects' • erbalizations to speculate on modifications made during the solution 
attempts for the audiotaped subjects. 

While the advantages of videotaping for recording subject behaviors 
were obvious, performance differences due to the videotaping procedure were 
possible The investigator suspected that videotaped subjects spent less time 
solving the test problems and that their haste resulted in lower scores than the 
audiotaped subjects earned. These suspicions gave rise to hypotheses one 
(HI) and two (H2): 

Hypothesis H 1 : The mean score on achievement for videotaped subjects 
equals the mean score on achievement for audiotaped subjects. 

Table 5 



Source 


df 


MS 


F 


P 


Treatments 


1 


0.24 


.006 


1.00 


Error 


29 


38.31 
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Table 6 



Analysis of Variance for Subjects' Total Solution 
Times on the Interview Test 



Source 




MS 


F 


P 


Treatments 


1 


101.00 


3.97 


.10 


Error 


27 


25.44 







'Due to erasure of tape, two subjects' protocols couid not be timed. 



I Hypothesis H2: The mean soluiion lime of the videotaped subjecis 
equals ihe mean solution lime of ihe audioiaped subjecls. 

The ANOVA for HI and H2 are reported in Tables 5 and 6, 
respectively. 

As Table 5 indicatc.>, null hypothesis HI could not be rejected. The 
very low ratio of .006 was an indirect result of the close similarity of the video- 
taped and audiotaped subjects' total scores. The videotaped subjects mean 
score was 9.7 with a standard deviation of 5.8, while the audiotaped subjects 
achieved a mean score of 9.9 with a standard deviation of 6.2. 

As shown in Table 6, the significance level of .05 was not reached and 
H2 could not be rejected. However, the F ratio of 3.97 was significant below 
the . 10 level, and the analysis suggested there were some treatment differences. 
The videotaped subjects' mean solution time was 16.7 minutes and the 
audiotaped subjects* mean time was 13.0 minutes, contradicting the investiga- 
tor's belief that the subjects performing in front of a camera may have worked 
more hastily. 

Lucas (personal communication) suggested that coding videotaped 
protocols took less time than coding audiotaped protocols. His observation was 
tested with hypothesis H3. 

Hypothesis H3: The mean coding time For audiotaped protocols equals 
the mean coding time for videotaped protocols. 

The analysis of variance of these data is reported in Table 7. 



Table 7 

Analysis of Variance for Coding Times 



Source 


) df^ 


MS 


F 


P < 


Treatments 


1 


0.68 


.002 


1.00 


Error 


27 


292.09 







'Due to erasure of tape, two coding times could not be measured. 



*I*ablc 7 illustrates that the extremely low ratio of .002 did not reach 
the .10 significance level. Thus, H3 was not rejected. The means of 42.3 
(videotaping) and 42.6 (audiotaping) minutes of coding time per subject and 
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variances of 17.3 (videotaping) and 15.8 (audiolaping) indicated thai coding 
lime di.sinl)uliuns were nearly identical. However, ihe videotaped protocols 
lasted 251 minutes and took 635 minutes to code, while the audiotaped proto- 
cols were 182 minutes long and took 597 minutes to code. Thus, 1 minute of 
audiotape took an average of 3.28 minutes to code, but 1 minute of videotape 
t(Kjkonly 2.53 minutes to code. Coding 1 minute of videot: pe took only 75% as 
long as coding 1 minute of audiotape, a savings of approximately 22%. 

Statistical Analyses of Rankings 

The feasibility of using a written instrument as a substitute for the 
interview and coding procedure depended upon the relationships between the 
data from the written tests and the interview tests. Two written tests, the WT 
and the WT2, were administered and three rankings. A, B, and C, were devel- 
oped from the IT. The initial statistical findings are reported next, followed by 
an explanation of the exploratory procedures used to seek additional rankings. 

Relationships oj the written and interview tests. The rankings from the 
written and interview lests yielded two possible comparisons. A Pearson prod- 
uct-moment correlation coefficient r^y (Hays, 1963, p. 497) was computed 
between the raw scores (number correct) on the written tests and the inter- 
view test total and subtotal scores. For each correlation coefficient, a hypothesis 
that Ihe population statistic P^y equals zero was tested by a /-test with A/-2 
degrees of freedom. ^ 

In addition to the correlation between scores, the relationship between 
the rankings developed from the tests was also measured. Kendall's r (Hays, 
1963, p. 642) with ties was computed for the association between the rank- 
ings, and the significance level of T was found by computing z values. Because 
of ties within rankings, Goodman's and Kruskal's 7 statistic (Harp, 1963, p. 
655) was computed to provide a simpler interpretation of Kendall's T . 
The correlations and rankings statistics are presented in Table 8. 

Table 8 



Correlation and Ranking Statistics for the 
Interview Test and the Written Tests 







r 


p{T ) 


7 


WT and Ranking A 


.61* 


.44 


.001 


.48 


WT and Ranking B 


.40 


.33 


.007 


.34 


WT and Ranking C 


.59* 


.39 


.002 


.41 


WT2 and Ranking A 


.64* 


.49 


.001 


.52 


WT2 and Ranking B 


.48* • 


.38 


.002 


.40 


WT2 and Ranking C 


.61* 


.45 


.001 


.46 


(WT + WT2) and Ranking A 


.68* 


.50 


.001 


.52 


•Significant at the .001 level in a two tailed Mest of HqI P^y = 0. 
• 'Significant at the .05 level in a two tailed Mest of HqI Pj^y = 0. 




As reporied in Tat)lf 8, nunc of ilic ' jrrclauoM tocHit icnis ht uvtu n ilic 
seven pairs of writicn and interview irs' scores aiiaincd ihc desired minim uni 
of .71, although the ronihincd scorcb of ihc VV'I* and ihe VV'r2 produeed nn 
encouraging correlation ')efficicni of ,68 with the total IT score. Two pairs tjf 
scores ( WT and Ranki , A; WT2 and Ranking C) each produced a correla- 
tion of .61. All seven con elation coefficient > resulted in Ntest values signiiicaiu 
at the .05 level. Thus, the hypothesis that no correlation exists between written 
and interview test scores was rejected. 

The associations between the rankings reported in Table 8 resulted in 
low but statistically significant values. Kcndalfs T ranged from a low of .33 
for WT and Ranking B to a high of .50 for (WT + WT2) and Ranking A. 
KruskaPs 7 ranged from .34 for WT and Ranking B to ,52 for VVr2 and 
Ranking A and (WT + VVT2) and Ranking A. 

Exploratory Procedures 

Latent partitioning and clustering were the statistical analyses used to 
find underlying patterns among subjects and to possibly produce other ranking 
schemes. Becau.se the computer program for latent partitioning was not avail- 
able, Guttman-Lingoes multidimensional scaling was substituted. A similar- 
ity measure D based on subscores for Approach, Plan, and Result was c(jm- 
puted between each pair of the 31 subjects and was used in both analyses. 

The CJuttman-Lingoes multidimensional scaling program (Lingoes, 
1973) searches for underlying patterns or structures among similarity mea- 
sures. The program then represents the structure in a spatial model by as- 
signing c(K)rdi nates to the subjects and computes stress values to measure the 
agreement, the order of the s()at;al distances, and the (jrder of ihe similarity 
measures- High agreement \< indicated by low stress values. A second mea- 
sure, the ( t)cincicfil ol ali' iiaiion, deals with a ty()e of moiioidnicity criterion 
for the relalion.shif) hetween distance and similarity measures. 'I*he cijordi- 
nates, stress values, and coefficients of alienation for one, two, three, and four 
dimensions were produced by the Guttman-Lingoes program. The one-di- 
iriension results accommodate ! a ranking w hich closely ()arallelcd Ranking .A. 

JohnsfHrs ( 1967) max clustering algorithm was the second explora- 
tory procedure used to group subjects according to a structure underlying the 
similarity measures. The program defines a sequence of partitions of a set of 
objects and uses similarity values to determine diameters of the subset. The 
max procedtjre (instructs hierarciiical partitions containing subsets of mini- 
mum diameter and assigns a partition rank to each pair of objects. Inspection 
of Johnson s clustering results revealed a pattern strongly resembling the 
ranking scheme developed from one-dimensional scaling. 
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Conclusion 

This study aiiempied lo find valid procedures for measuring students' 
m;.^hematical probkiu-solving achievement. The commercial tests which were 
examined seemed inadequate to assess that achievement. Taped thmking 
aloud interviews and an associated coding system capture and classify mathe- 
matical problem-solving behaviors much [)etter than commercial tests, but 
these complex interview procedures are not feasible for large scale use. A writ- 
ten test having high concurrent validity with experimental interview results 
would be a usefuUliernative. The first question this study examined was the 
feasibility of producing such a written test. 

The physical and statistical qualities of the WT and WT2 indicated 
that they were suitable for administration to seventh-grade students in class- 
rooms. Experimental Grpups A and B required, on an average, less than 27 
minutes to complete the WTs, and no great deviation would be expected when 
parallel forms of this test are used by other seventh-grade classes. The average 
Hoyt reliability (Hoyt, 1941 ) of both written tests across all groups was an 
acceptable .79. 

The feasil)ility of the written test was measured by its prediction of 
seventh graders* problem-solving performance and ranks on the IT. The prod- 
uct-moment correlation coefficient was .61 between the IT and WT ranks and 
.64 between the IT and WT2 ranks. Though both values were highly signifi- 
cant (P <: .001 ), neither the WT nor the WT2 attained the minimum corre- 
lation of .71. Thus, the written test was declared not presently feasible for 
predicting mathematics problem-solving performance as measured by the 
thinking aloud procedure and coding scheme. Future research could improve 
the test statistics by replicating this study with a larger population, using a 
longer written test, using more mathematical problems on the interview test, 
using a revised scoring system, or screening the WT items and IT problems to 
select only those which have high correlation to other items. 

The second main question of the study was, "Is it possible to assess, 
separate, and rank seventh graders according to their problem-solving proto- 
cols?" The answer appears to be positive. A variation of Lucas's ( 1972) cod- 
ing sysicm was applied to verbal problem-solving protocols with a high degree 
of agreement and reliability. Rankings A, B, and C were derived from the 
scores awarded by Lucas's point system and provided high rank order agree- 
ment measures. The order imposed by Ranking A was consistent with similar- 
ities and patterns detected among the subjects by sealing and clustering analy- 
ses. Statistics comparing rankings also indicated a high degree of agreement. 
Future research will be needed to refine the application of multidimensional 
scaling and clustering procedures to measures of m.nhematical problem-solv- 
ing achievement. 
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Probably the most imporlanl finding of this study was the answer lo 
the first research question. The question was **How well do the thinking aloud 
procedure and related coding scheme capture and classify the mathematical 
problem-solving behaviors of seventh graders?" The answer appears to be, 
"Not very well." The behavior of the students during the thinking aloud inter- 
views raised critical questions about the reliability and validity of the informa- 
tion recorded. The seven subjects who were obviously anxious were unlikely to 
exhibit their normal problem-solving behaviors. An additional six subjects 
gave more subtle indications that they were anxious. Therefore, almost one 
third of the subjects were not performing normally. Other subjects who had 
difficulty talking while thinking add to the suspicion that the procedure did not 
adequately represent problem-solving behaviors and thai it may not be highly 
valid or reliable with seventh graders. Systematic examination beginning with 
first graders and continuing through adults should detect general trends in 
ability to think aloud with increased mental maturity. 

Videotaping has a distinct advantage over audiotaping because it can 
detect silent rereading, drawing and altering diagrams, and written computa- 
tion. Videotaped protocols also take less time to code p'jr minute of tape. Fu- 
ture investigators will need to decide if the extra information and time saved is 
worth the expense of videotaping- The author's dissertation contains a more 
complete account of this study (Zaiewski, 1974). 
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Chapter 8 

Development of a Test of 
Mathematical Problem-solving 
which Yields a Comprehension, 
Application, and Problem-solving 
Score 

Diana G. Wearne 



Maihemaiicians and maihematics educators agree on the importance 
of developing ihc problem-solving abilities of children. The Cambridge Con- 
ference on School Mathematics (EducationaJ Services, Inc., 1963), the Col- 
lege Entrance Examination Board (1959), and the National Advisory Com- 
mittee on MathemalicG Education (NACOME) ( 1975), among others, all 
stressed the importance of problem solving in school mathematics pnjgrams. 

P'ollowing the.^e recommendations, problem solving has become promi- 
nent in text series. One such series, Developing Mathematical Processes 
(DMP) (Romberg, Harvey, Moser, & Montgomery, 1974, 1975, 1976), 
views problem solving as the vehicle for achieving its program goals (Romberg 
& Harvey, 1969). DMP is a research based, individually guided instructional 
program in elementary mathematics developed by the staff of the Analysis of 
Mathematical I'^ ;^oiion Project at the Wisconsin Research and Develop- 
ment Center for Cognitive * c;»rning at the University of Wisconsin. T' e au- 
thors refer lo the program*s activity approach to learning as learning through 
problem solving. 

There has been some disagreement on ihr type of problems to include 
in a mathematics program. Kline (1973) and others have consistently and 
broadly criticized the application problems in mathematics texts as having lit- 
tle in common with real life situations. Nelson and Kirkpatrick (1975) also 
have emphasized real life situations. Others believe the real life category to be 
loo restrictive and have advocated any problem available for mathematics- 
analysis (NACOME, 1975). However, there is no disagreement on the im- 
f^)rtance of including problem-solving activities in mathematics programs. 

Polya*s (1962) frequently quoted statement on the importance of 
problem solving voices the feelings of virtually all mathematics educators: 

What is know-how in mathematics? The ability to solve problems — 
not merely routine problems but problems requiring some degree of in- 
dependence, judgment, originality, and creativity, (p. viii) 
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In addiiion u» advoraling ihe inclusion of problem-solving materia! in 
svhml mathcn^atics programs, malhemiilics educators are engaged in research 
on problem-solving behavior, particularly the heuristics of problem solving. It 
appears that a measure of problem-solving ability is needed to determine how 
well problem-solving abilities are being developed. The instrument described 
in this chapter was developed in response to this need. 

Background of the problem 

An individually administered test of problem-solving behavior not only 
produces a acort but also provides an opportunity to observe the child solving 
the problem. The child can be asked how the problem was solved, or if the 
child was unsuccessful, what path of reasoning was followed and what aspects 
of the problem were confusing. 

However, the limited amount of time usually allowed for assessing 
problem-solving behavior makes a group-administered test necessary. In 
group-administered testing, however, the examiner is unable to identify which 
subjects wf re unable to solve the problem because they did not understand the 
infonnation presented, had not mastered the concepts or processes needed, or 
could not apply the prequisite concepts or processes even though they knew 
them. 

A cursory examination of existing group-administered tests of prob- 
lem-solving behaviors reveal that the authors of these tests apparently have 
defined problem solving in terms of verbal, mainly one-step, problems. The 
operation required to solve the problem is frequently implied by the wording 
of the problem itself; for example, asking "What is the area of . . . ?" or 
"How much more . . . ?" 

The Stanford Achievement Test (Kelley, Madden, Gardner, & Rud- 
man, 1964) contains a section entitled ^^Applications." However, the Direc- 
tionljnr Adminniermg, Intermediate I Battery refers to that portion of the 
tests as measuring problem solving. Examples from the section include: 

2, Don is delivering papers to earn more money. He had 150 papers to 
deliver an hour ago. He has delivered 90 of them now. How many 
are left to deliver? 
25, If two pencils cost 15?f, how many can you buy for 30vi? 

Intermediate I Battery is designed for children in grade 4 and the first 
half of grade 5. 

The Modern Math Understanding Test, Form Multilevel Edition 
(Science Research Associates, 1966) classifies 12 items as being problem solv- 
ing or application problems. All of the items may be described as simple appli- 
cations or concept assessments. Examples from this test are as follows: 
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9. On a certain map a distance of 1 inch represents 200 miles If the 
distance between two towns is l^h inches on the map, how many 
miles apart arc the towns? 

17. Which numeral must be placed in the box to make the following 
.sentence true? 

6 X 2= (□ X 2) + (1 X 2) 
35. The perimeter of this rectangle is . 

(A pictured rectangle is shown with the measurements 1 1 inches and 
4 inches on adjoining sides.) 

Other items in this set refer to the concepts of relatively prime numbers and 
equivalent fractions and to adding the length.s of line segments together. 

Other tests such as the Iowa Test oj Basic Skills ( Hieronymus & Lind- 
quist, 1971) and the Metropol ita n A ch ieuanen t 7 ests (Durost, Bixler, 
Wrightstonc, Prescott, & Balow, 1971a, 1971b) also contain problem set.s 
identified as assessing problem-solving behaviors. The comments made about 
the Stanford Achievemeril Jest and the Modern Math LIndvr standing lest 
apply to these tests as well. 

An alternative to the standardized tests is a test consisting of items 
which conform to the investigators' definition of problem-solving (Kilpatrick, 
1967; Zaiewski, 1974); Zaiewski's study is reported in Chapter 7. 

Studies have reported the primary factors related to success in prob- 
lem-solving are reading to note details, understanding of the vocabulary, mas- 
tery of the necessary computation skills, and knowledge of the relevant mathe- 
matical concepts (Chase, 1960; Johnson, 1944). Alexander (1960) and 
Treacy (1944) reported that good and poor problem solvers differed in aspects 
of reading. SpeciGc instruction in quantitative vocabulary was found by 
Vanderlene (1964) to increase problem-solving scores. Bogolyubev (1972) 
noted that children could misconstrue words in verbal problems, thus misun- 
derstanding the problem to be solved. In another study, Egan and Green 
(1973) reported that individual differences in prerequisite knowledge were 
more important for "discovery" learning and creative problem-solving than 
for "rule" learning* Norman (1950) found that merely possessing a necessary 
computational skill did not imply a child would be able to solve verbal 
problems using that skill. 

The research reporu-d above underscores the possibiliiy lliat lack of 
f.'irntliarity with the vocabuhiiy in a question or not possessing an appropriate 
level of concept attainment may result in an ina)rrect response to a probleni- 
.solving (juestion. 
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The Test 

When presented with a problem-solving task, ideally ihe ihild should 
understand the information in the task description and have the prerequisite 
skills for solving the problem. Without this understanding and skill, there is no 
certainty thjtt a child cannot solve a problem simply because he or she has 
responded incorrectly. For example, if the task involves the concept of area and 
the child is unfamiliar with this concept, a wrong answer will not necessarily 
indicate inability to solve the problem, but merely that the child is unfamiliar 
with the underlying concept. 

The test described in this chapter sought to evaluate the child's under- 
standing of the vocabulary and mastery of the prerequisite concepts or 
processes of each problem-solving question. Such a test could yield more infor- 
mation about the child and provide a "truer" measure of problem-solving abil- 
ity by considering, as a measure of that ability, only those problems for which 
the child has mastered the prerequisites. 

The test was to produce three scores: a comprehension score, an appli- 
cation score, and a problem-solving score. To accomplish this, the test con- 
tained clusters of items called superitems; each superitem consisted of a com- 
prehension item, an application item, and a problem-solving item. 

The comprehension item assessed the child's understanding of the im- 
plicit or explicit information in the itc n stem. The application item assessed 
the child's mastery of a prerequisite concept or skill of the problem-solving 
item by applying it in, a fairly straightforward way. The third item was the 
problem-solving one. 

A problem situation was defined as a situation which posed a question 
whose solution was not immediately available; that is, a situation which did 
not lend itself to immediate application of some rule or algorithm. An effort 
was made to construct short items that did not appear impossible for the child 
I,) solve. A guide for the construction might be found in Hilbert's (1906) 
comment. 

A mathematical problem should be difficult in order to entice us, yet not 
completely inaccessible, lest it mock at our efforts. It should be to us a 
guidepost on the hazy paths to hidden truths and ultimately a reminder 
of our pleasure in the successful solution, (p. 59) 

Although the application and problem-solving items referred to a com- 
moo unit of information, the item stem, the items were independent to the 
extent that the rcsix.nsc lo the application item was not needed to answer the 
problem-solving item. 

Examples of Superitems 

The first example of a superitem appears in Figure 1 . The first of the 
three questions comprising the superitem is the comprehension item. This 
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The distance from Alta to Bright is: 7 miles 

12 miles 
16 miles 
19 miles 

The shortest distance from Alta to Drago is: through Bright 

through Cable 
through Elmtown 
through Fiagge 



BRIGHT 


16 








should be placed: in Drago 


ELMTOWN. 


19 








in Alta 



in Fiagge 
in Cable 

Figure 1. Example 1 of superitems. 
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4,562 
45,620 
45.260 



Figure 2. Example 2 Of superitoms. 

Us 



seeks to (lelcrininc if the child understands the infcjrmation on the inap; spcelH- 
cally, that the numhers on the map are distanees in miles between towns. 

The seeond item is the application item. To respond correctly, the child 
must be able to read the map, identify the distances referred to l)y the question, 
and correctly add these distances together. 

To respond correctly to the problem-solving item, the final one in ihi.s 
» superitem, the child must be able to read the map, add two distances, and find a 
position on the map corresponding to the given distances. 

The second example of a superitem is shown in Figure 2. Once again, 
the first item is the comprehension item. This assesses the child's understand- 
ing of the information presented in the item stem. The application item is a 
direct application of the information in the item stem. The problem-solving 
item asks the child to arrive at a generalization of the information in the item 
stem. 

Development of the Test 

A further impetus in developing ix u .:t of problem solvint^ was the de- 
sire of the Analysis ol Matheiniuies Instruction (AMI) projeci stall l(» include 
a problem -solving measure as one component of the terminal acc(juntability 
tests- for Devdofnng Mathematical Processes (DMP) (Romberg et al., 1974, 
1975, 1976). 

Because the test described here was to be a model for the problem- 
solving portion of the DMP terminal accountability tests, constraints were 
imposed on the application and problem-solving questions. The application 
items had to measure mastery behaviors of the program and the problem- 
solving items had to measure behaviors beyond the mastery level of children at 
the end of the fourth grade; the test described in this chapter was designed for 
fourth-grade children. For example, in DMP children are expected to master 
finding the area of a rectangle or a figure composed of rectangular regions* by 
the end of the fourth grade; children are not expected to master finding the 
area of a nonrectangular figure or a figure not composed of rectangles. Thus, 
finding the area of a rectangle would be an appropriate application item and 
finding the area of a nonrectangular region would constitute a problem-solving 
question. ' 

Question Raised by the Format of the Test 

A test e(>iTifMj.*;cd of supcritems yields more information than one con- 
taining only problem-solving questions; however, structuring the test in this 
manner raises some questions. 

The questions are as follows: 

1. Does asking a scries of questions have a facilitating or debilitating 
cfi'eet on the response to the questions? In particular, does the inclusion of a 
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Table 1 

Description of the Tests 

Type of items 

Test Comprehension Application Problem-solving 

X 

X 

X 

X X 

X X 
X X 
XXX 



comprehension and an application item have an effect on the response to the 
problem-solving item? 

2. How should the reliability of ihe lesi be estimated? 

3. Wh;jt type of validity will be obtained? 

4. To what extent is the model for the test supported by the test re- 
sults? That is, are correct responses to the comprehension and application 
items required for a correct response to the problem-solving question? 

A discussion of the procedures followed in investigating these questions 
is contained in ihe next section. 



C 
A 
P 
CA 
CP 
AP 
CAP 



The Results 

The EfiPect of the Superitem Format on Item Response 

The investigation focused on the effect of the comprehension, applica- 
tion, and problem-solving items upon each other; that is, the effect of asking 
muUipIe questions on the same unit of information upon the response to those 
questions. The inclusion of the comprehension and application items provides 
more information than a test containing only the problem-solving items. How- 
ever, if including the additional items affects the response to the problem-solv- 
ing items, then the strength of the problem-solving test is diminished. 

To determine the effect of the items upon each other, six tests consisting 
of subsets of the original set of items were constructed. Three of the tests con- 
tained only one of the three types of items; the remaining three tests contained 
two of the iliree types of items. Thus, there were a total of seven tests, the six 
tests containing subsets of the items and the complete test. The content of the 
tests is described in Table 1. 

Each of the subset tests was administered to approximately 50 chil- 
dren. To minimize teacher effect as much as possible, each lest was adminis- 
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Table 2 

Number of Children Taking Each of the Tests 

Test 



School 


C 


A 


P 


CA 


CP 


AP 


CAP 


Total 


1 


10 


14 






13 


11 




48 


2 




12 


12 


14 




11 




1 49 


3 




8 


7 


11 


9 






35 


4 


19 


13 


25 


10 


14 


12 




93 


5 


9 


3 


5 


7 


10 


12 




46 


6 














90 


90 


7 














53 


53 


8 














68 


68 


9 














37 


37 


10 














31 


31 


11 














38 


38 


Total 


38 


50 


49 


42 


46 


46 


317 


588 



tered in four classes. Each of the four classes was in a diflerenl school except 
for two classes in the same school who took the C test. The children in each 
class were randomly assigned to one of two instruments. The complete test was 
administered either by the investigator or by a person who had been trained in 
responding to the children's questions; the other six tests were administered by 
the investigator. The number of children taking each test is given in Table 2. 

The population for this study consisted of fourth-grade children from 
Wisconsin who had been using DMP (Romberg et al., 1974, 1975, 1976) for 
at least P/2 • • ,> T.'^'^se who look the complete test (CAP) either attended 
one of four scht ■ v> ; - My of 40,000 which is a suburb of a large city or they 
attended one of two schools in a suburb of a medium-sized city. The children 
taking the subtests attended schools in the following locations: i\ city of popula- 
tion 50,000. a .small town, a suburb of a medium-sized city, and a mediuni- 
sixcd ciiy. 

It \v:is (lilli( nil Id (Ictci niinc ;in ndniinisiraiion time for ihc si.x suliirsis. 
Tin* julininisiriiiion lime of .1 ctMnplru* lesi is 45 niinulrs. However, ii was noi 
possiblf 10 nuTcly ap|W)nion lime based upon the number of items in a subicsl. 
I'uo assinnpiions allectcd ihr adniinistraiioii lime: 

1. It was assumed the problem-solving items require more response 
lime than the application items which, in turn, require more response time 
than the comprehension items. 
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Table 3 
Test Administration Time 



jgst Administration time 
(in minutes) 

C 20 

A 25 

p 30 

CA 35 

It 

AP 35 

CAP 15 



Table 4 



The Means. Variances, and Reliability Estimates of the Scales 
on Each Test Containing the Scale 



Scale 


Test 


Number 


Mean 


Variance 


Reliability 




C 


38 


15,95 


6.97 


.49 




CA 


42 


17.02 


7,54 


.58 


C 


CP 


46 


15,93 


10.06 


.65 




CAP 


317 


15.53 


9,74' 


.63 




A 


50 


11,08 


17.22 


.78 




CA 


'42 


10.50 


15.43 


.76 


A 


AP 


46 


8,61 


15.09 


.75 




CAP 


317 


10.01 


13,15 


.71 




P 


49 


4.24 


11.02 


.74 




CP 


46 


2.93 


4.37 


.49 


P 


AP 


46 


3.46 


7,81 


.69 




CAP 


317 


3.30 


6.13 


.60 




CA 


42 


27,52 


38.74 


.82 


CA 


CAP 


' 31? 


25.54 


37.41 


.80 




CP 


46 


18.87 


18,96 


.69 


CP 


CAP 


317 


18.82 


22,53 


.73 




AP 


46 


12.07 


38,86 


.84 


AP 


CAP 


317 


13,31 


31.26 


.80 


CAP 


CAP 


317 


28,84 


62.18 


,84 



119 

152 



■ Table 5 



ANOVA Summary Table for Scores 


Scores 




df 


MS 


F 


Comprehension Scores 










Between groups 


86.98 


3 


28.99 


3.ir 


Within groups 


4098.70 


439 


9.33 




Application Scores 










Between groups 


157.42 


3 


52.47 


3.75* 


Within groups 


6309.09 


451 


13.99 




Problem*solying Scores 










Between groups 


48.25 


3 


16.08 


2.42 


Within groups 


3015.51 


454 


6.64 





•p <.05. 



2. Since several iiems frequently shared the same iicm stem, ii was 
assunicil thai the child did not have lo process two pieces of informal ion and, 
hence, would not need the sum of the limes nec-ded U) respond lo the items 
andividually. 

Usint; these tssumpiions, the administration times chosen were as {"ollows: 20 
niinutes for the test, 25 minutes for the A test, 30 minutes for the P test, and 
35 minutes for the (^A, CP, and AP tests. The administration times are sum- 
marized in Table 3. 

As noted previously, the test contains three types of items; for the re- 
mainder of this paper, these categories of items will be referred to as scales. 
The number of subjects, means, variances, and reliability estimates for the 
Comprehen.sion, Application, and Problem-solving scales on each ol the tests 
containing them are reported in Table 4. 

The means of the scales on the various tests were compared to deter- 
mine if the items had an effect on one another. Consequently, it was more 
serious to neglect to identify a significant difference than to identify more sig- 
nificant differences than actually exist. Stated another way, a Type II error 
was of greater consequence than a Type I error To avoid a Type II error a 
more generous alpha level was chosen than would be u.sed if one were primar- 
ily interested in avoiding a Type I error. The results of the analysis of variance 
for each (»f the three scales are summarized in Table 5. 

.•\n (I fmstcriuri multiple compari.son test was used to determine if the 
difference between the means on the same scale were significantly different on 
the tests containing the scale. Normally one would use a planned comparison 
procedure rather than a post hoc procedure when it is essential that a Type II 
error not be committed. However, all pairs of means were to be used, and this 
use violates the independence required by a planned comparison test. Indepen- 
dence is not required by post hoc procedures. The procedure used in this study 
was Scheffe\s ( 1953) method of comparing all possible means. 
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'I'he probability of overlooking a true difference from zero is grc-ater in 
ihr p<rst hoc procedures than in the planned method. Therefore, an alpha level 
of . 1 0 was selected. The probability of obtaining at leasi one spuriously signili- 
cam comparison using a post hoc procedure equals alpha (Hays, IV73). 

Two differences were signiBcant at the .10 level, both of which were 
also significant at the .05 level. One of the significant differences was found 
among the comprehension scores and one among the application scores; no 
significant differences were found among the problem-solving scores. The sig- 
nificant difference amoi'^g the comprehension scores was between the mean 
comprehension score on the CA test and the mean comprehension score on the 
CAP test. The significant difference among the application scores was between 
the mean application score on the A and the AP tests. 

- Two hypotheses may be advanced to account for these differences. One 
hyp<)thesis is that the items have an eflect, under certain conditions, upon one 
another; the other is that the children did not have the same amount of lime to 
respond to a particular group of items on all the tests containing that group. 
These hypotheses will be examined in turn. 

The higher mean comprehension score on the CA test may have been 
due r.o a facilitating effect of the application items. This effect could occur if the 
child responded to the comprehension item after responding to the application 
item, a retroactive effect, or if the application items had a stimulating effect 
upon the response to the comprehension items. The CAP test also contained 
the application items but it could be that the facilitating effect of the applica- 
tion items on this test was counterbalanced by a debilitating effect of the prob- 
lem-solving items or that a retroactive effect does nal take place in the presence 
of the problem-solving items. However, if a debilitating effect was produced by 
the problem-solving items, this effect should also have appeared on the CP lest, 
which was not the case. A possible argument still remains that the debilitating 
effect of the problem-solving items ufK)n the comprehension items only lakes 
place in the presence of the application items; such an intricate dependency is 
possible, but unlikely. 

The second hypothesis concerning the administration times of the tests 
offers another possible explanation for the significant difference. The children 
taking the CA test may have had more time to respond to the comprehension 
items than the children taking the other three tests containing the comprehen- 
si(^n,iicms. The CA and CP tests both had an administration time of 35 min- 
uics; however, the application items are assumed to be less difficult than the 
problem-solving items. Thus, the children may well have had more time to 
respond to the comprehension items on the CA test than on the CP test. Al- 
though the mean comprehension score on the CA test was not significantly 
different from the rtiCim comprehension score on the CP test, the mean on the 
CP test differed from the mean eompfi-hension score on ihff CAP test by AO 
points. The administration time for the CAP test was 45 minutes, in minutes 



longer ih.m ihc .iclminisiraiion linic: of the CA Ktsi, l)Ui the CAP test im ludcd 
ihc pr(»hlcif j-solvint^ items. 'I'hc problen.-solving items are assumed lo be more 
diffieuli and, hence, require more response lime than the other iwo types of 
items. Therefore, of the two possil)le explanations for the significant differ- 
ences among mean comprehension scores proposed in the preceding 
paragraphs, the more reasonable appears to concern the administration time 
of the tests. 

•['he significant difference among mean application scores was between 
the score on the A test and on the AF test. One possible explanation for this 
significant difference is the problem-solving items on the AF test had a 
debilitating effect upon the response to the applicaticrn items, llowever, this 
same effect did not occur on the CAP test though it can be argued that the effect 
of the problem-solving items on the CAP was tempered by the effect of the 
comprehension ilems on the application items. This inierdependcnty may be 
in effect but does not appear likely. 

Once again, another explanation for the difference lies in the adminis- 
tration time allotted for the tests. The children had 25 minutes to respond to 
the applicaticrn ilems on the A test but only 35 minutes to respond to both the 
application and the problem-solving items on the AP test. The CAP test con- 
tained the comprehension items in addition to all the items contained on the 
AP test. Although the comprehension items require the least response time, 
the administration time of the CAP test was 10 minutes longer \h<\n that of the 
AF test. This provided more time for the children to respond to the appliuf ion 
and problem-solving items than they had on the AP icsi. The difference bt> 
iween the mean applica'ion scores on the A and C:a;P tests w^js not sign = IV ant. 

'l'hu:s. of the two hypotheses advanced to ejsplai. the sigfMficani differ- 
ence existing between the mean applicati(»n score- on the A and AP tcMs. the 
more reasonable appears to be the elFect of the tiimc allotted lo respond to the 
items on the test. 

Conditional Probabilities Associated with the Items 

The superitem model assumes that the comprehension and application 
items assess prerequisite behaviors for the problem-solving item. The data for 
determining how well the superitems Gt the model are prrNcnicd in ihis 
section. 

In the model, a correct response to the comprehension item was a pre- 
requisite to responding correctly to the application item, which in turn was a 
prerequisite jo answering the problcm-solv'ing item. To determine to what 
extent this was true, the tolhwing conditional probabilities were computed: 

Prob (Comprehension item correct | Application item correct) 

Prob (Cornfirehension item correct | Problem-solving item correct) 

Pro!) (.Application item correct | Prol>lem-solving item correct) 



Table 6 

Conditional Probabilities Associated with the Superitems 



Item 


P(ct a)"" 


P(ctp)2 


P(a|p)3 


P (c n a 1 p) ^ 


1 


.98 


.99 


A*) 


.83 


2 


.98 


.99 




.91 


3 


.90 


.85 


.00 


.74 


4 


.78 


.80 


.00 


.63 


5 


.95 


.97 


QA 


.92 


6 


.81 


.74 


,0O 


RR 


7 


.79 


.77 


.90 


.71 


8 


.96 


1.00 


.80 


.80 


9 


.91 


.91 


.83 


.78 


10 


.70 


.76 


.47 


A '1 


11 


.78 


.71 


.76 


.62 


12 


.85 


.77 


.62 


.52 


13 


.63 


.77 


0 


0 


14 


.93 


.97 


.80 


.77 


15 


1.00 


.90 


.29 


.29 


16 


.72 


.59 


.24 


.24 


17 


1.00 


1.00 


.95 


.95 


18 


.91 


,97 


.83 


.7^ 


19 


.72 


.52 


.13 


,11 


20 


.94 


.86 


,64 


,64 


21 


.75 


,74 


,67 


.50 


22 


.98 


.94 


.86 


,86 


^Conditional 


probabilities: 


Prob (comprehension 


item correct 


1 application item 



correct) , 

2prob (comprehension item correct | problem-solving item correct) , 

^Prob (application item correct | problem-solving item corrreci) . 

"^Prob (comprehension and application items correct | problem-solving item correct) . 

Prob (Comprehension and Application items correct | Problem-solving 
item correct) 

The values for the Prob (Comprehension item correct | Application 
item correct) ranged from .63 to 1.00 with a mean conditional .probability of 
.86. The Prob (Comprehension item correct | Problem-solving item correct) 
varied from .52 to 1 ,00 with a mean probability of .82. The mean conditional 
probability of Prob (Application item correct | Problem-solving item correct) 
was .67; the probabilities ranged from 0 to .95. The Prob (Comprehension 
and Applicntion items correct | Problem-solving items correct) ranged from 0 
Lo .95 with a mean probability of .62. The conditional probabilities for the 
items arc listed in Table 6. 

A partiiil ordering of the superitems based upon their conditional 
prol)abilities for each of the four conditional probabilities of interest is con- 
tained in Talilr ''^ 
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Table 7 

Ordoring of the Conditional Probabilities of the Superitems 



Conditional 

probability P (c | a) P (c | p) P (a | p) P (c fl a | p) 

Superitem numbers 



.90 - 


1.00 


1,2,3,5,8, 


1,2.5,8,9, 


2,5,7 


2.5,17 






9,14,15,17, 


14,15,17, 










18,20,22 


18,22 






.80 - 


.89 


6,12 


3,4,20 


1,3,6,8,9, 


1,8.22 










14,18,22 




.70 - 


.79 


4,7,10,11, 


6,7.10,11, 


11 


3,7,9,14,18 






16,19,21 


12,13.21 






.60 - 


.69 


13 




4.12,20,21 


4,6,11,20 


.50 - 


.59 




16,19 




12,21 


.40 ' 


.49 






10 


10 


.30 - 


.39 










.20 - 


.29 






15.16 


15.16 


.10 ' 


.19 






19 


19 


.00 - 


.09 






13 


13 



For a supcriienn lo agree with the model, all four of the conditional 
probabilities should reflect that agreement. .Superiterns were divided into three 
categories based upon their conditional probabilities; the divisions were arbi- 
trarily .selected by the investigator. If .75 is a minimum conditional probability 
at which to consider a superitem acceptable, then 10 of the superitems satisfied 
this criterion; that is, 1 0 of the superitems had four conditional probabilities of 
at least .75. There were seven additional superitems all of whose conditional 
probabilities were less than .75 but which were at least .50; the.sc superitems 
were considered marginally acceptable. Five superitems did not have all four 
conditional probabilities of at least .50. Table 8 lists the numbers of the super- 
items calegori/cd as Acceptable, Marginally Acceptable, and Unacceptable. 

There were five superitems whose conditional probabilities placed 
them in the Unacceptable category. They were superitems 10, 13, 15, 16, and 
19. Four of these superitems amJained problem-solving items which ra iked 
among the six most difficult pnibiem-solving items on the test. The difficulty 
levelsof the problem-solving items lor superitems 10, 13, 15, 16, and 19 were 
.05, .04, .07, .05, and .26, respectively; the mean difficulty level of all the prob- 
lem-solving items was .15. These five superitems also contained application 
items whi( h were among the six most difficult on the test. The difficulty levels 
of the applif:ilion items were .43, .05, .08, ,09, and .16 for superitems 10, 13, 
15, 16, and 19, respectively; the mean difficulty level of all the application 
items on the test was .46. Therefore, for four of the five superitems, at most 7 % 
of the children responded correctly to the problem-solving portion and for four 
()f the five superitems. at most 16% ()f the children responded correctly to the 
application portion. *l*hesc diffi( uhy values may have contributed to the low 




Table 8 

Categorization of the Items on the Basis 
of Their Conditional Probabilities 



Category 


Conditional 
probability 


Superitems with all four of the 
conditional probabilities at 
that level 


Acceptable 


.75 - 1.00 


1,2, 3, 5, 8, 9, 14, 17,18, 22 


Marginally 


.50 - .74 


4,6,7, 11, 12,20,21 


acceptable 






Unacceptable 


.00 - .49 


-30, 13, 15, 16, 19 



conditional probabilities in that the probabilities were based on very small 
samples. For example, only 13 children responded correctly to the problem- 
solving item of superitem 13 from a sample of 317 children. 

An examination of the five superitems has led the author to conclude 
ihat superitems 1 3 and 1 6 may have been placed in the Unacceptable category 
for reasons other than failure to fit the model. (For a detailed discussion of the 
superitems, see Wearne, 1976.) 

Validity , , . 

Dur to lack of established criteria against which the tasks could be 
validated, the only indicant of validity available was content validity as judged 
by a panel of experts. 

A test may be said to have content validity if it measures something 
which a group of authorities asserts it does measure. The American Psycho- 
logical Association and the American Education Research Association in their 
joint publication Standards for Educational and Psychological Tests and 
Manuals defines content validity as follows: 

Content validity is demonstrated by showing how well the content of the 
test samples the class of situations or subject matter about which conclu- 
sions are to be drawn (American Psychological Association, 1966, p. 
12). 

Thus, evaluating the content validity of a test is tantamount to evaluating the 
adequacy of a definition. 

A panel of six judges was selected on the basis of their familiarity with 
the DMP materials and interest in problem-solving research. A constraint on 
developing superitems for the test had been suitability for fourth-grade chil- 
dren in the DM? program. Thus, the judges had to be familiar with the con- 
tent of Ihe DMP program to Judge whether a required behavior represented 
routine application of a concept or algorithm (application item) or if the be- 
havior represented a nonroutine application (problem-solving item). 
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The judges were given the dcfiniiioii.s of ihc ihree categories ol hems 
(comprehension, application, and problem-solving) and the items irom the 
test in a random order. They were asked to classify the items as comprehension 
items, application items, or problem-solving items. The definitions given (he- 
judges were as follows: 

1. Comprehension Item: This item assesses the child's understanding 
of the information contained either implicitly or explicitly in the item stem. 
When the item is assessing information contained implicitly in the item stem, 
it may be thought of as assessing the understanding of the definition of a basic 
concept underlying the situation. 

2. Application Item: This item is a fairly straightforward application 
of some rule or concept to a situation. This item may bethought of as assessing 
what is considered to be a mastery behavior in the DMP program at the end of 
the fourth grade. 

3. Problem-solving Item: A problem situation is defined to be a situa- 
lion which [M)ses a (luestion whose solution is not immediately av*:ilable, that 
is, a siiuatijHi which docs not lend it.scll to an immediate application of some 
rule or algorithm. This item may be thought of as assessing behavior beyond 
the mastery level of UMP at the end of the fourth grade. 

The judges' classification of the items was compared to the classifica- 
tion of the items on the test. The mean proportion of judges agreeing with the 
test classification was .84. The mean proportion of agreement with the test 
classification of comprehension items was .89, the mean agreement on the ap- 
plication items was .78, and the mean proportion of judges agreeing with the 
problem-solving classification was .84, Table 9 lists the proportion of Judges 
agreeing with the test classification for each item. 

There were nine items, out of a total of 66, on which fewer than two- 
thirds of the judges agreed with the test classification of the items; half of the 
judges agreed with the test classification on five of these nine items. One of the 
nine items was a comprehension item, five were application items, and three 
were problem-solving items. The tendency, when disagreeing with the appli- 
cation category, was to rate the item as problem-solving. Comprehension and 
problem-solving items were rated as application items when disagreeing with ' 
• ihc test classifications for these items. 

A measure of a.ssociation was computed between t[ie judges' classifica- 
tion of the items and the classification of the items used when developing the 
lest. The strength of the association was computed to be .77; the index used 
was Cramer's Statistic 0' (Hays, 1973, p. 745). The number of items 
placed into each of the categories by the judges is shown in Table 10. 

Of the six judges classifying the items, onejudge's classification differed 
from the test classification on six of the 66 items. Three other judges di&ered on 
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Table 9 

Classificatio n of Items by Judges 

Judges' 



Super- 
itGfn 


Test 
cl&ssificstion 


classification 
CAP 


Prnnnrtinn nf ilidciAS 

O^l vwll 1^ 

with test classification 




C 


0 


n 


n 


1.00 


1 


A 


c 


-y 


n 


.17 




p 


0 


5 


1 


.17 




C 


5 


1 


U 


.DO 


2 


A 




c 
O 


1 


.90 




p 


0 


4 


2 


.33 




C 


5 


0 


1 


HQ 
.DO 


3 


A 


2 


4 


n 
U 


R7 

.Of 




p 

r 


0 


0 


6 


1.00 




c 


6 


0 


0 


1.00 


4 


A 


1 


Q 

o 








p 


0 


0 


6 


1.00 




C 


5 


1 


0 


.OO 


5 


A 


0 


4 


2 


.Df 




p 


0 


0 


6 


1.00 




C 


5 


1 


0 


.0*3 


6 


A 


0 


6 


U 




p 


0 


2 


4 


,67 




C 


6 


0 


0 


^ nn 
I.UU 


7 


A 


0 


4 




.Df 




p 

r 


0 


0 


6 


1.00 




c 


5 


0 


1 


QQ 


8 


A 


0 


6 


U 


1 nn 
I.UU 




p 


0 


1 


5 


.83 




C 


6 


0 


0 


^ nn 
I.UU 


9 


A 


0 


6 


0 


^ nn 
I.UU 




D 

r 






D 


1.00 




C 


6 


0 


0 


1,00 


10 


A 


0 


6 


0 


"1 nn 
l,UU 


P 


0 


0 


6 


1.00 




C 


6 


0 


0 


1.00 


11 


A 


0 


6 


0 


1.00 




P 


0 


0 


6 


1.00 




C 


e 


0 


0 


1,00 


12 


A 


0 


6 


0 


1,00 




P 


0 


0 


6 


1.00 




C 


6 


0 


0 


1.00 


13 


A 


0 


6 


0 


1,00 




P 


0 


2 


4 


.67 




C 


5 


1 


0 


.83 


14 


A 


0 


6 


0 


1.00 




P 


0 


2 


4 


.67 
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15 



17 



18 



19 



20 



21 



22 



c 


6 


0 


0 


1.00 


A 


0 


3 


o 
o 




p 


0 


0 


6 


1.00 


C 


4 


2 


0 


.67 


A 


0 


2 




«oo 


p 


0 


0 


6 


1,00 


C 


4 


2 


.0 


.67 


A 


0 


3 


Q 
O 




p 


0 


0 


6 


1,00 


C 


6 


0 


0 


1.00 


A 


0 


6 


0 


1 nn 


p 


0 


0 


6 


1,00 


C 


6 


0 


0 


1.00 


A 


1 


4 


1 


,67 


P 


0 


2 


4 


.67 


C 


6 


0 


0 


1.00 


A 


1 


5 


0 


.83 


P 


0 


0 


6 


1.00 


C 


5 


1 


0 


.83 


A 


0 


6 


0 


1.00 


P 


0 


0 


6 


1.00 


C 


3 


3 


0 


.50 


A 


1 


5 


0 


.83 


P 


0 


3 


3 ^ 


.50 



ERIC 



15s 



161 



Table 10 



Total Number of Items Placed into Each 
of the Categories by the Judges 



Test 
dassificdtion 




Judges' classification 




Total 


C 


A 


P 


C 


118 


12 


2 


132 


A 


11 


103 


18 


132 


P 


0 


21 


11 


132 



eight of the items. One judge differed 16 times and the remaining judge differed 
from the test classification on 18 occasions. Thus, four judges differed a total of 
30 times and the remaining two judges differed a total of 34 times. The four 
judges who differed the fewest number of times had all been writers of the 
DMP program for a minimum of 2 years. Of the two judges who differed the 
greatest number of times, one had been a writer for only 1 year and had been 
involved with the revision of the K-2 materials rather than the 3-4 materials; 
the test coptains more material related to the 3-4 component than the K-2 
component. The judge who differed 18 times with the test's classifications had 
been associated with the DMP program since its inception but not as a writer 
of curriculum materials. It is quite possible that the familiarity gained by writ- 
ing the materials may have made those four judges more aware of what con- 
stitues mastery behavior of the program than the two judges who were noi 
involved with writing the 3-4 materials. Classification of the application and 
problem-solving items rests upon mastery objectives of the program. 

No one judge consistently rated items as being in a higher or lower 
category than the classification of the items on the test. However, the two com- 
prehension items rated as representing a problem-solving situation were rated 
by the same judge. 

Reliability 

Two estimates of reliability of the test were computed, the Hoyt relia- 
bility and KR-20 reliability. The Hoyt reliability was computed under the 
.assumption that the test items were independent. However, if an indication of 
dependency v iS found, then the Hoyt estimate would be rendered unreliable. 
To satisfy this contingency, a generalized KR-20 was also computed for the 
test. This procedure was suggested by Cureton (1965) when discussing the 
problems associated with computing reliabilities of a test consisting of 
superitems. 
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wfiere n is ihe number of superiiems, o is ihe score-variance on each of the 
superiiems, and o ^jesi variance of ihe scores on all ihe superiiems. 

I ioyi reliabilily esiimaies were also compuied for each of ihe six subsei 
icsls as well as for each of ihc ihree scales, Comprehension, Applicaiion, and 
Problem-solving, contained in ihe six lesls. Thus, a reliabilily eslimaie of ihe 
comprehension scores was oblained on ihc C, CA, CP, and CAP icsts. The 
rclial)ililics for ihc Comprehension scale ranged from .49 on ihc C icsl lo .65 
on ihc CP icsl. '['he reliabilily lor ihe Applicaiion scale ranged from .71 on ihc 
CAP icsl lo .78 on ihe A icsl. The reliabilily cslimaies for ihe Problem-solving 
sralc ranged from .49 on ihr CP icsi lo .74 on ihe P icsl. 

The iclial>ililics roinpulcd for ihr CA, CP, and AP IcsU were .K2, .69, 
and .84, respeciively. The reliabiliiics of ihese scales on ihe CAP were .80, .73, 
and .80, respeciively. The reliabiliiies are given in Table 4. 

The quesiion of independence of ihe iiems was discussed in ihe seciion 
7he eject of the superitem format on item response. The conclusion reached in 
ihis section was ihai ihe iiems did noi have any effeci upon one anoiher and 
ihose effecis ihai were noted were ariifacis of ihe administration time. Thus, il 
may be assumed ihai ihe reliability eslimaie of .84 as reported by the Hoyi, 
which requires independent items, does not represent an inflated estimate of 
reliabilily. 

The difference between the more conservative reliabilily estimate of .79 
oblained by using a generalized KR-20 and the Hoyt estimate of .84 may exisi 
because when items were placed into groups of three, as they were to form ihe 
superilrms, variance among the items which had been part of the item-covari- 
ancc in the Hoyl estimate constitute part of the item variance in the KR-20. 
ConscqncniJy, the KR-20 reapportions the total variance of the test with a 
i^rcairr i)ro|M»riion of the variance now being part of the item variance rather 
ihnn the rovariance of the items. This results in the ratio of item varianLC to 
loial variancc being larger and, hence, a lower reliability estimate. 
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Reliabilities were also computed for each scale, Comprehension, App- 
lication, and Problem-solving, on each test containing the scale; estimates oi' 
reliability wcr*; obtained for the entire CA, CP, and AP tests as well. 

The reliabilities for the Comprehension scale on the four tests contain- 
ing the scale varied from .49 on the C test to .65 on the CP test. The associated 
variances ranged from 6.97 on the C test to 10.06 on the CP test. The low 
reliability estimates are to be expected when takmg the variances inio account. 

The rcliiibility estimates associated with the Application scale ranged 
from .71 on the CAP test to .78 on the A test; thc! variances from 13.15 on the 
CAP test to 17.22 on the A test. 

I ne range of reliability estimates reported for the Problem-solviftfe 
scale wa^ from .49 on the CP test to .74 on the P test; the associated variances 
ranged from 4.37 on the CP test to 1 1.02 on the P test. The reliability esti- 
mates are a reflection of the skewed distribution of scores on this scale. The 
range of the problem-solving scores on the CAP tests was from 0 to 15 with a 
mean of 3. .30 and a standard deviation of 2.48. The mean proportion of chil- 
dren responding correctly to the 22 problem-solving items on the CAP test was 
.15. 

The reliability estimates for thi. CA, CP, and AP tests were .82, .69, 
and .84, respectively. The reliabilities of these scales on the CAP test were .80, 
.73, and .80 for the CA, CP, and AP tests, respectively. 

One way of increasing the reliability estimates of the Comprehension 
and Problem-solving scales would be to increase the variability in the re- 
sponses to the items. However, to do so would violate the definition of these 
categories in both cases. The purpose of the comprehension item was to assess 
the child's understanding of the information contained, implicitly or explicitly, 
in the item stem. One would expect a relatively high degree of success on this 
item. The range of difficulty levels associated with the comprehension items 
was .35 to .98; the mean item difficulty was .71. An example of a comprehen- 
sion item to which virtually all children responded correctly is in superitem 2; 
96% of the children responded correctly to this item. The item (see Figure 1 ) 
asked the child to determine the number of miles between two towns, a dis- 
tance given on the map. As stated earlier, the purpose of this comprehension 
item is to determine if the child understands that the numbers on the map 
represent distances, in miles, between towns. Therefore, this item is an appro- 
priate cr)mprehension item. To assess a more complex behavior would exceed 
the definition of the item type. 

A problem situation was defined as one which poses a question whose 
solution is not immediately available, that is, a situation which does not lend 
itself to immediate application of some rule or algorithm. This definition per- 
mits a wide latitude in choosing problem situations since the only constraint is 
that the solution behavior required is not a master/ behayor from the first 65 
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Kipit'S (»1 I )M P .iini. ,\{ thr saiiu; tiiiu'. i> ;i lor whic h ihc i hikhcn h.n i- 

tin- |>r('! <'< j I II • » >r,i c j n u.il .iiid t nmj nii.iiii <i;;il skills. 

IV^Mhiioiiall'. . in\r^tiL;.»Uirs of piohh-m-solvitiii; hch;i\'i(>r hiivr tm l)crn 
piiriiruhu'ly n>ii(.iTtu'(i with thr irli:;!»ihiy ol ihrir tcsis. 'i'hc invcstig.Kors' ap- 
paicnllv ((jfisiflt-r tlir task iisril (o l)c more import. ml ihiui tiny associate index 
(»r ( onsisiciH V. Ilowrvcr, when ;i(hninistfrin{4 a tcsi lo a i^roup dI i liildrrn. an 
in\''sitv;aioi \vouifl hkt" lo Ik* rfasonai)l\' conlidciu that iht- ri'suhs aii' rrliahh*; 
ihat is, if \\w ( [liUh-cn look the tt'si ai^cUii. the sanu* ordering would on-ur. 

Kriialjiliiv of (he l*i ohK'f[j-st»l\ inji; scah' would imrTasi: if ihc ilcins 
were h'ss .lilliiuli Hf)Wi'\rr. ^aisMl^ die rtdiafiiiity hy dcU'lini; challiMiging 
tpicruions would fu- nnilcsirahlf. Lord and Novick ( 1074) luakv thr foliowint; 
pariKulariv apt (f»niincni about rtdiahilily. 

fn.t\inii/in'j: ifif iidiahihi\' ina;- scunriiuics \n: an undi'sitahli^ tj;oal. 
I*(M' cva/npic. a >i!l's« i t)f lai tnal iirnis in an af hii'\ <"Micnt ti-st iuav yield a 
riKWT M'iiahlf sron- than ihr locd sci n{ \\vn\<. This ( an happen, h)r 
ani[)h'. if" ihc ^iihrr iieins invohc sui h hard-to-nicasurc hut iitiportani 
irrtii-- riMsrinin*: ahilii\' and rnaiiM' thinkinii;. ( \>. ^AA) 

Secondary Analyses of the Data 

An lu-rn Analysis 

N< ► ! le.ji 1st' nil a ir^t is rrttnplr'tr wii lioui a disci rssit in of an iieni analy- 
sts h»r thai wsi . The .uial v'>is for ;his loj w.is [)eriorinfd usini; the Cieneralized 
hrtn and 'Vv>\ ,\nal\-,i<; i^o.^rain ((MTAIM M^akor. P>63). This procedure 
yields ihc folinwiiii; pa rainriers: item dilficulty, hiscrial corrflation. A ^q, 
and |3 

riic wv [ii (iililt nh\ IS the piopuition ol cinldrcn respondini; correitly lo 
.iM iiem. '[ fic hist ital (<ar('laiion (item-criterion correlation j is ohlaincd Ijv 
[lypoihe.-irzin^ iiir csi^teme of a uMUimious latent vai iahle underlyini; the dl- 
rhoioniv imposed in S( tiring ihr itian. It is assunu-d the distribution ol mea- 
sures in ihe san^ple lor -Ainrh binary \alues aii' i^iven ib actually normal but 
thai, ai some poin' in the fhstribmion a sej>aration has been made with those 
eases Iviui!^ ,i\u,\r (he point \)nnu, as.si^ne(l a s{ore of 1 (correei) and those 
bcltnv a ><orr of U (ineorre(t). < )ne niay tfiink of the biserial eorrelatiofi as a 
samf>if' me asure of asso( iiuion lor the item score and the total test sc^orc. Under 
the as:aH]iptit)n ;h.ai ihe se»Uf> are noiinally distrilnjted an<l tin- regression ol 
total test .MUX* iiu the iietn s^orr is linear, the biserial correlation may be 
vieweri as an csiinKiie of the product morneni correlation between the item 
score and tlie loial le.^t score. 

'I Ik- \ ,md p are also obtained by hyj)oihesi/:ini^ the existence ol' a 
cnminut.ns lau-ni \ariabl<- u/^ilcilyin^ thf dichotomy imposed in scoring th<- 
iicrn. The r e<;i cssiejn *^l the hyj>otluMival iti-m score on the criterion is called the 
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Figure 3. Typical item characteristic curve. 

iiem characteristic curve. The X^q is the point on the criterion scale, given in 
standard deviation units, corresponding to the median of the item characteris- 
tic curve. It is the reciprocal of the standard deviation of the item characteristic 
curve. One may think of /? as the slope of the item characteristic curve at the 
point of A' 50 although this is technically not true. Figure 3 is an illustration of 
a typical item characteristic curve. The biserial correlation and /? are related 
in the following manner, under the assumption the criterion is normally 
distributed: 

/I + ^'bis 

The generally accepted criteria for a "good item" are P and biserial 
correlation values of at least .30 (Harris, 1968). Obviously, the higher the 
value of p , the greater the slope of the item chai acteristic curve, indicating the 
item is discriminating more clearly. Thus, an item to which everyone re- 
sponded correctly would be judged a "poor" item. Although one would like 
high biserial correlations and a high value for /? , it is theoretically possible for 
them to be too high. If for example, an item had a biserial correlation of 1 .00, it 
would be superfluous to include another item with the same difficulty level 
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Table 1 1 

Item Parameters 

. J. 



ItGm 




P 






''bis 




Q 


A 


p 




A 


p 


1 


.04^ 


.35 


.45 


.04^ 


.33 


.41 


2 


.34 


.54 


.71 


.32 


.48 


.58 


3 


.34 


.68 


.75 


.32 


.56 


.60 


4 


.28^ 


.44 


1.34 


.27a 


.40 


.80 


5 


.32 


.50 


.66 


.30 • 


.45 


.55 


6 


.25^ 


.47 


.63 


.24 


.43 


.53 


7 


.49 


.88 


1.17 


.44 


.66 


.76 


8 


.30 


.50 


1.71 


.28^ 


.45 


.86 


9 


.40 


.54 


1.71 


.37 


.48 


.86 


10 


.47 


.79 


.37 


.43 


.62 


.35 


11 


.29^ 


.51 


.65 


.28^ 


.45 


.54 


12 


.26^ 


.51 


.44 


.26^ 


.45 


.40 


13 


.43 


-.04^ 


.39 


.39 


-.04a 


.36 


14 


.81 


.77 


.32 


.65 


.61 


.63 


15 


.59 


.60 


.60 


.51 


.51 


.51 


16 


.63 


.60 


1.21 


.51 


.53 


.77 


17 


1 29 


.96 


.63 


.79 


.69 


.53 


18 


1 18 


.52 


.81 


.76 


.68 


.63 


19 


.98 


.38 


.20^ 


.70 


.35 


.19^ 


20 


1.12 


.56 


.21^ 


.75 


.49 


.21^ 


21 


.94 


.81 


.35 


.69 


.63 


.33 


22 


.87 


.41 


.75 


.66 


.38 


.60 



^Poor questions. 

sincf a subjcri would a-spond the same to both items. Similarly, an increase in 
the value ol i3 . wliile ihc values of other parameters remain fixed, eould lower 
the pre( isif)n of .in estimate. Tills is dcscrifjod a.< the attenuation paradox 
( Loevingcr, 1 ). I'herefore, an ideal test .should have varying values for )3 . 

!>ighi of the 66 items on the test had values for /3 wh'u h were less 
ihan the desired le\cl of .30. I'ive of the.sc eight iiem.s were romprelirnsicni 
iieins. one was an apf)h(:a'.ion item, and two were problem -solving items. Of 
iliese eighl lU ins. \\\c value ol j3 lor four was between .25 and .2V; these valuc.s 
indicate marginal ac»*epi.ability. .Ml four iiems were (-(miprehension items. The 
j3 vaiiie lor die eomprebension item of suf)eritem 1 ( Figure I ) w;is .04; this is 
not an unusual value lor an item to which 98% of the children responded 
corredly. The value of (i of the :i[)[)licaiion itcnt of superitem 13 wa.s -.04. 
NoMf ol the ( In Id ren who responded correctly to the problem-soK ing item of 
this superiUMM responded cf)rrectly to the application item. Therefore, it is not 
surprising the .q){)litaiion item would correlate poorly with the total test score. 

The items with low values for j3 had bi^verial correlatir)n values of 
approximately the same size. The additional item with a low biserial correl?- 
lion was the comprehension item of superitem 8; the biserial correlation re- 
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Table 12 

Means on A, and P Scale for Each Cluster 



1 

2 (0) 

3 (-f) 

4 (++) 
Tota; 

population 



Number of 
subjects 



91 
112 
99 
IS 

317 



1198 
1C.96 
17-^2 
19.33 

;5.53 



Means 



6.30 
?.30 
13.17 
17.00 

10.01 



1.59 
2.35 
4.91 
10.01 

3.30 



poited for this item was .28 indicating the item is marginally acceptable. The 
/? and biscrial correlation values are given in Table 11. 

In conclusion it may be said that all but four of the 66 items had 
values for P which were acceptable (at least .30) or marginally acceptable 
(.25 - .29). 

The ResiJu of a Cluster Analysis 

A clustering procedure was used to discover if there was any structure 
(natural arrangement of children into homogeneous groups) inherent in the 
data Investigators using the test may correlate measures of mterest with any 
of the three scores produced by the test. However, additional insight may be 
gained by examining the correlations of specific groups of chilaren with simi- 
lar characteristics identified by the tC5,t. 

A Wards clustering procedure (Johnson, 1967) was used. The Wards 
procedure is a maximum method clustering, that is, the value of the clustering 
is ihe maxfnium diameter of the dusters produced. At any step in the cluster- 
ing process, the distance from an object in cluster; lo an object in cluster k is 
the diameter of the cluster which is the union of clusters; and k. This proce- 
dure appeared to produce four clusters. 

Cluster 1 consisted of 91 children whose mean comprehension, appli- 
cation, and problem-solving scores were all below the grand means; this group 
was designated as the (-) group. The mean scores of the 1 12 children in 
Cluster 2 were approximately at the mean for all three srores; this group was 
called the (0) group. Cluster 3 consisted of 99 children -.-hose mean scores 
were above the grand mean for all three categories of items; this group was 
railed the ( + ) group. The fourth cluster consisted of 15 children whose appli- 
cation and problem-solving scores were two standard deviations above the 
mean for the entire group of children; it was impossible for comprehension 
scores to be two standard deviations -il)Ove the mean Cluster 4 was designated 
the (++) group. Tht mean scores for each of the clusters are given in Table 
12 and the range of tlie scores within a clu.sier is shown in Fi-uic 4. 
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i^iift* «1. Par^<;«;' or ihi* scores wrhm e«3ch chj«;'».'?. 

^^6 



Conditional probabilities lor the items were examined for the children 
in ca. h of the four dusters. 'I'he number of items fitting the model was fewer in 
the (-) group lhan in the {++) group; only five superitems were acceptable 
in the (-) group, indicating that all four conditional probabilities were at 
least .75. Sixteen superitems were acceptable in the (++) group. 'len items 
were acceptable or marginally acceptable for Cluster 1, 14 for Cluster 2, 19 U)r 
Cluster "5 and 21 for Cluster 4. The conditional probabilities for the super- 
items in each cluster are contained in Table 13 through Table 16; the number 
of superitems fitting the categories of Acceptable, Marginally Acceptable, and 
Unacceptable for each cluster are contained in Table 17. 

A superficial explanation for the trend towards more superitems fitting 
the model as the scores of the children increase may be that the conditional 
probabilities reflect correct responses to more of the items. That is, if the chil- 
dren are responding correctly to virtually all of the items, then the conditional 
probabilities would necessarily be close to 1.00. However, the mean proLicm- 
solving score for Cluster 4 (++) was 10.01 out of a possible score oi 22; this 
indicates that even the children in the highest category were not responding 
correctly to the majority of the problem-solving items. 

Another explanation for the trend towards more of the items fitting the 
model a:> scores in: re.-se is the children do not respond as erratically to the 
multiple choice items in the ( + )and {++) groups as in the (-) and (0) 
groups. 



Table 13 

Conditional Probabilities for Cluster 1 



S'S' P(cla) P(c|a) P(a|P) P(cna|p) 



Superltem numbers 



90 - 1 00 1.2.3.5.8.15. 2.5.10.22 1.2.5.8.13, 5.22 

lV.l8.22 17.18.22 

.80 -.89 12.13.14 3 3 9 2 

70 - ^ 6.9.20 1.7 11 'l^ 

60 - 69 11 9.17 12,14.20 17 

fo - :59 4.10 6.11.12.18 4.6J 69.11.18 



.40 - .49 7 



19 12 



!30 - ioti 21 14.20.21 21 20 

.20 - .29 



19 4 

8.10 

15.16.19.21 



10-19 ^ 

:00 .:09 16 13.15.16.19 10.15.16 t^^-^^^'t 
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Table 14 

Conditional Probabilities for Cluster 2 



Conditional v 

probability P(ci a) P(c| p) P(a| p) P jc O a | p) 

Superitem numbers 



90 - 1.00 1.2.3.5.?.9, 8.17 1,2.5.8,9, 8.17 

15,17.20.22 13.14.15,16. 

17.18.20 

80 - .89 6. -3.14.18 1.7.3 3.4 1.5 

70 - 79 4.7.12.19.21 2,3.14 6.7.12.21.22 2.3.9.14 

60 - .69 iO 4.6,11,18.22 t0.11 4.22 

50 - .59 11.16 21 6.7.11,18 

40 - .49 IS 

30 - .39 12 21 

20 - .29 15 12.15 

10 - .19 19 19 

00 - 09 10.13.16.20 10.13,16.20 



Table 15 

Conditional Probabilities for Cluster 3 

Conditional 

probability P(c|a) P(c|p) P(a|p) P(cna|p) 

Superitem numbers 



.90 - 1.00 1.2.5.8,9.11. 2.5.6.17.20 1.2.3.5.8.9. 2.5.17 

14.15.17.18. 10.14.15. 

20.22 17.18 

.80 • .89 3.4.7.12.16.19 1,3.7,9.10,14. 4,12.20.22 1.3,9.10,14.18, 

18.20.22 20.22 

.70 - .79 6.10.21 11.12,21 6.7.11.21 6.7 

60 - .69 4.8 4,8.11.12 

.50 - .59 13.19 21 
.40 - .49 

.30 ' .39 16 

20 - 29 13 15 15 

.10 - .19 19 

.00 - .09 T3.16 13,16.19 
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Table 16 

Conditional Probabilities for Cluster 4 



Conditional 
probability 



.90 - 1.00 



.80 - .89 
.70 - .79 
.60 - .69 
.50 - .59 
.40 - .49 
.30 - .39 
.20 - .29 
.10 - .19 
.00 - .09 



P(c| a) 



1.2.4.5>6.7.8. 
9.10.12.13. 
14.15.16.17. 
18.20,21,22 

11 
19 
3 



P(c|p) 



P(a| p) 



P(c n a| p) 



Superitem numbers 



1.2>3.4.5,6, 

7.9.10,11.12. 

17/8.21.22 



8.14 



15.20 
16.19 



13 



2.4.5.7.8. 
9.10.12.13. 
14.15.17.18. 
19.21>22 

1.6.11 
16 
20 
3 



2.5.7.9.10. 

12.17.18. 

21.22 



1.4.6.8.11.14 



15.20 
3,16.19 
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Table 17 

Categorization of the Items on the Basis 
of Their Conditional Probabilities by Cluster 



Category 



Number of items with all four 
conditional probabilities at that level 



Probability 



Cluster 
1 



Cluster 
2 



Cluster 
3 



Cluster 
4 



Acceptable 

Marginally 
acceptable 

Unacceptable 



.75 - 1.00 

.50 - .74 
.00 - .49 



5 
12 



12 

6 
4 



16 

5 
1 



9 
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Results of a Second Clustering Procedure 

A sf(on(l (lusKTint^ [)r(Kc<iurc w.is pcrlornu'd. this unic <lisr(j;.ir(litii; 
ihc < (unprcljcnsion s<'f)rrs. i'wo iiM t.>[ s piuinplcd ilu* decision to omii the < oin- 
preheiiMun siorcs: du: conditional prob.djiliiies .ind the length of the test. The 
mean conditional prob:d)ility of responding correctly to a comprehension item 
following a correct application res[)onse is .86; if the four lowest conditional 
probabilities are omitted the mean < (Miiiitional prob.djility is .00. A (hild is 
reasonably certain to respond < ()rre( lly to a < omprchension item after respond- 
ing rorre< tly to the ap[)licai.ion itern ol thai su peri tern. 

The second factor induencin;^ the decision to examine the data for 
sircicture without the comprehension scores w<is the length of the test. The test 
consists of 66 items t(» be administered within a 43-mimite time period. Delet- 
ing the ( orn[;'rcln nsion items would relcas*- response titiie lor the a[)plicatioi) 
and prc»l)|cm-soK-ui^ it<-ms wiihout in< r*(Msing the total adminisi ration time. 
Deleting dn ( ontpn-hension ilcnjs is a lairly serious step as these items indi- 
(aie untlersianding of the information in the ilerii stern» and incidentally, pro- 
vide the (hild with itcnjs which .irr easier at regular intervals. Due to these 
two factors, it was interesting to see ii a clustering proiedure would produce 
meaningful groups when the (:omprchensi(m items were omitted a,nd ii so, to 
what e.xierii ihc ';n ups dilfered from those pnxiuted by the initial clustering 
f)rocess. 

The initial clustering [)rocess produced four groups, three of similar 
size, 01. 00, and 1 12, and one small group<»f 13. The second clustering proce- 
dure also appeared to yield Univ clusters; however, one of these groups was 
almost one-half of the total group. 147 of the 317 children. Two other gnjups 
were appro.ximately the same size. 82 and "2, and there was one small group 

i Ur l(M,ir new ( iusters may be designated in a manner similar to those 
formed in the initial clustering. Cluster A, the largefit, consisted of children 
whose mean ap[)li(aiion and problem-solving scores were below the grand 
mean; this group will be designated as die ( ~* ) t^roup. ('luster B (insisted of 
( hildren whose mean a['»[')licatit>n and prublem-soK in^ s( rjres were similar to 
tlu- mrati s( orrs for the entire grifup; this i;r*)Up will be c.illed the (0*) group. 
Thr^>^e in (duster C had mean .ippli(aiion arwl [)r<tf)lem-solving scores which 
were abf»ve the mean, thi.s group will be called the ( + *) group. Cluster D, 
again the smallest. ^ insisted of those whose mean application and problem- 
solving scores weie two standard deviations above the mean for the entire 
group; this group will be designated the ( ++*) gnjup. T he rn<?an scores of the 
children in the dusters is shown in Table 18 and the range of the scores within 
a cluster is [)iesented in I*'lgure 3. 

The * hange in the t^rouf^s from the ifiitial clustering to the second clus- 
tering was as follows: 
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Table 18 

Means on A and P Scale for Each Cluster 



Means 



Ciusler 


riumber of 
subjects 


A 


P 


1 (-•) 


147 


7.10 


1.52 


2 (0*) 


82 


(0.71 


3.28 


3 (+•) 


72 


13.61 


£.44 


4 (++•) 


16 


17,00 


10.00 


Total 








population 


317 


10.01 


3,30 



1. There were 147 children in Cluster A, 87 of these were in Cluster 1 
and 60 in Cluster 2. 

2. Cluster B consisted of 32 children; four of these were in Cluster 1 , 
50 in Cluster 2, and 27 in Cluster 3. 

3. All but one of the children in Cluster C were in Cluster 3, the re- 
maining child was in Cluster 2. 

4. Fifteen of the 1 6 children in Cluster D were in Cluster 4, one child 
was in Cluster 3. 

Therefore, it may be said that the comprehension scores had the effect of plac- 
ing six children in a lower category and 89 in a higher category. This is appar- 
ent if one considers the mean scores of the children in the (0) clu.ster. These 
scores were slightly below the mean while those in the (0*) cluster are ap- 
proximately at the mean. 

There is one important difference between the clusters produced using 
two scores and those using all three; the range of the problem-solving scores 
within a cluster is smaller when the comprehension scores are omitted than 
when they are used. That is, the groups are more homogeneous with respect to 
the problem-solving scores. In the initial clustering procedure, there were chil- 
dren in the ( - ) group, for example, who had problem-solving scores above the 
mean. They had responded correctly to fewer application and comprehension 
items ihan most, yet at the same time responded correctly to more problem- 
solving items than most; there were four such children. The range of the prob- 
lem-solving scores for Clusters 1, 2, 3, and 4 was 7, 8, 8, and 7, respectively. 
The range for the clusterings in which the comprehension scores were omitted 
was 3, 6, 7, and 8 for Clusters A, B, C, and D, respectively. This difference 
may provide slightly different results when other measures are correlated with 
the scores of children within a cluster. 
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Applicririon 
Scores 



Problem -solving 
Scores 



Cluster A ( - • ) 
Cluster B (0*) 
Cluster C (^M 
Cluster 0 (♦ 



□ 



Figure 5. R^nge of the scores within each cluster. 
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Table 19 
IntercorreiatioRS of Ihe Item 





C 


A 


P 


c 


1 






A 


.641 


1 




P 


.431 


.667 


1 



The Ck>mplcxity of the Items 

An insight into the relative complexity of the three types of items is 
provided by Guttman (1954). Guttman's theory is concerned with studying 
the order of complexity of a set of variables. The order of complexity is deter- 
mined by the =- -ircorrelations of the variables. The intercorrclations in a per- 
fect simplex i - (generated by the law rjk = ajAk where r^^fe is ihe correlation 
between the jth and kih variables and r,j and are the simplex loadings for 
these variables. If the intercorrclations form a perfect simplex, the variables 
arc said to have a simple order of complexity. 

The correlation between the comprehension and application items was 
.641; it was 657 between the application and problein-solving items, and .431 
between the comprehension and problem-solving items. The matrix of in- 
lercorrelations is shown in Table 19. 

The intercorrelations almost form a perfect simplex; the correlation 
between the comprehension and the problem-solving scores would have had to 
be .428 rather than .431 to be a perfect simplex. Consequently, ihr. prcblcm- 
solv-ing items are more complex than the application items which, in turn, are 
more complex than the comprehension items. This relationship appears to 
give further support to the hypothesis that the items on the test fit their 
definitions. 

ll has been noted previously that the items are presented in order of 
difficulty; however, there is a difference between more diflicult and more com- 
plex. Difficulty in test theory connotes a relationship between group means. 
Consequently, it is always possible to structure tests to make one type of be- 
havior more difficult than another. Complexity, however, is defined in terms of 
correlation coefficients and a correlation coefficient is invariant under any lin- 
ear transformation of scores. Hence, changing group means need not change 
the correlation coefficients; in fact, the intercorrelations with other tests in a 
simplex can be essentially the same even if the order of difficulties is reversed. 
The reason for this is the correlation coefficients depend on the rank order of 
the people who take the test, and while scores may change, the rank order of 
the subjects docs not change. Therefore, Pearsonian coefficients usually do not 
vary much since they are closely related to rank correlations. 

Thus, one may state that the items which comprise the supcritems are 
not only increasing in difficulty, they are also increasing in complexity. 
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Summary 

The purfjosc ot ihr study was lo (icvclop a Icsl (if luaihcinaiical f)n)lj- 
Icrn-solviiiii; behavior whu h provided inlorrnalion ahom ihc child's masicry oi 
ihe prercquisiics uf each problcm-solving qucsiion. To provide ihis addiiionai 
information, each problem-solving question was preceded by two other ques- 
tions, ail related to ihc same item siem. One question assessed the child's un- 
derstandint; of the information (oniained in the item stem and a second as- 
sessed knowledge of an underlying concept or process of ihe problem-solving 
question. The first was referred lo as I he comprehension question and ihe 
second, the application qiieslion. 

To determine whether asking nuiliif)le questions on the same unit of 
information affcH ted response to the items, the three item types were adminis- 
tered alone, in pairs, and all three together. Means for each of the three scales. 
Clomprehension. Application, and Problem-solving, were compared across the 
four instruments (ontaining each scale. A post hoc procedure identified two 
signifiianl differences among all 18 differences considered. 

Two hypotheses were advanced to account for the significant differ- 
<*n<cs. (.1) (he chiidreti did ri.>i have the same amount of tinie to respond to a 
parii< Mlar grotip of iietns on all <»f tl)e tests containing that group of item.s, and 
M)) asking rniiliipie c|ir<'slions on the same unit of information afleds ihe rc- 
s|>oiise to tlie (joesiion. 'i he con{ lusion was thai the significant diflerenees were 
probaf)iy the result of administration times; tiiat is, the children did not have 
the same amouni of time to respond i{> a particular category of items on each 
test containing those items and tins alfecied their scores. 

Clondiiiimal probabilities were computed for each stiperitem to aster- 
tain if the comprehension item was assessing a real prerequisite of the appi:La- 
ti(m item and if both comprehension and application items were assessing ac- 
tual [)rere(|Uisiies of ibe proijlem-solving item. A superiiem was deemed 
accepial)le if all four of the conditional [>robabilities were at least .75; 1 0 of the 
22 superiteins were in this category with an additional six marginally accepta- 
ble. If one of the four conditional probabilities was less than .50, the item was 
considered unacceptable; five superitems were in ibis category. Two of these 
five siiperitems were i)elieve(l to be in this category for reasons other than fail- 
ure to lit thf test modeL 

The only indicator of validity iliat (ould l)e obtained was the content 
vali<li(y of iIh- test as measured by a [);mcl of judges interested in problem- 
solving resc.irrh lUc judges' independent ( lassificalion of the items was com- 
[)ared tn die ariua! classification of the iirnis on tiic test and a measure of 
association was computed between tliese two classifications; the computed 
measure was 78. 

'[Vo l eliabiliiy estinirties were comf)uted for the test, a Hoyt reliability 
estimate under the assumpti(m the items were independent and a generalized 
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KR-20 reliiibility estimate in the event the items were not independent. The 
Hoyt estimate was .84 and the KR-20 was .70. Due to the conclusion the items 
were independent, the .70 cstimaie was considered conservative and the Hoyt 
estimate of .84 was believed lo better repre.sent the reliability of the test. 

A cluster analysis was used to discover if there was any structure in- 
herent in the data. A first cluster analysis appeared to produce four groups of 
children. One group had mean scores which were below the mean, one at the 
mean, one above the mean, and one considerably above the mean scores for the 
entire group. The results of a second ciub-.ering, produced without the compre- 
hension scores, were similar to those formed by considering all three scores 
with the exception that the ringe of the problem-solving scores within a cluster 
was smaller when the comprehension scores were omitted. 

Conditional probabilities for the superitems were examined for each of 
the four clusters formed by considering all three scales. The number of super- 
items fitting the model was fewer in the (-) cluster than in the (++) duster. 
This may have reflected the children in the (++) group responding more 
consistently to the multiple choice items than those in the (-) group. 

A Guttman analysis indicated that the three categories of items were in 
order of complexity, that is, the problem-solving items were more complex 
than the application items which were more complex than the comprehension 
items. Thus, the items ci^mprising the superitenis are not merely in order of 
difficulty, but also assess increasingly complex behavior 

Conclusion 

The results of the study support the contention that a test composed of 
superitems is a viable form for assessing problem -solving behavior. The super- 
item test produces three scores with which to correlate other measures and 
scores of groups of children formed by a clustering procedure. 
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Chapter 9 



Mathematical Problem-solving 
Performance and Intellectual AbiJities 
of Fourth-grade Children 

Ruth Ann Meyer 

The purpose of this study was lo invesligaie relationships between 
mathematical problem-solving performance and intellectual abilities. More 
specifically, the investigator attempted to identify a structure of mathematical 
problem-solving performance. 

Background 

This study's inception and design are attributed primarily lo A Struc- 
ture of Concept Attainment Abilities Project (CAA) (Harris & Harris, 
1973). The CAA study was conducted at the Wisconsin Research and Devel- 
opment Center for Cognitive Learning during 1970 and 1971 to determine a 
structure of conce|)t a: • /nment abilities. Batteries of cognitive abilities refer- 
ence tests and tests lo measure attainment of mathematics, social studies, sci- 
ence, and language arts concepts were administered by the CAA staff to sam- 
ples of fifth-grade males and females. Through factor analysis, a basic 
cognitive abilities structure and relationships betweers concept learning and 
cognitive abilities in the four selected scl. A subjects were identified. .Harris 
and Harris (1973) summarized the results: 

We conclude that seven latent cognitive abilities underlie the test batter- 
ies that were Siudied and that these arc the same for both boys and girls. 
The seven abilities are: Verbal, Induction, Numerical, Word Fluency, 
Memory, Perceptual Speed, and Simple Visualization. The first six are 
the seven Primary Mental Abilities of the Thurstones. The seventh is 
similar to ihe Thurstones' Closure One but we prefer to call it Simple 
Visualization, (p. 169) 

Furthermore, the CAA staff found that: 

1 . At'hicvemeni in science and social studies was related to three abili- 
ties — Verbal, Inducticm, and Memory. 

2. .\( hievciiuMV. in language :ms and mathematics was related to three 
atidiiional abilities — Nimierlcal, Word Fluency, and Memory. 

3. Two abiiilics -IVrreptual Speed and Simple Visualization — 
.scfrneil nni to he related to achievement in these four subject-matter 
fields, (p. 195; 




Relevant Literature 

There are many studies such as Balow (1964), Beldin ( l^O), John- 
son (1949), Linvillc (1969), Norman (1950), Thompson (1967), and 
Treacy (1944), which demonstrate the influence of a single clement such as 
rtadinR comprehension, vocabulary, or computational ability upon success m 
mathematical problem solving. However, few investigations show relation- 
ships which may exist between a combination of elements or ab.lu.cs and 
mathematical problem solving. Consideration of a structure of mtclleclual 
abilities related to successful mathematical problem solvmg has been rare. 

Studies of structures of intellectual abilities related to problem solving 
have primarily investigated mathematical ability. These studies provide some 
insights into mathematical problem solving as their batter.es of tests generally 
include a problem solving or application test. An example was the .nvest.ga- 
tion by Very ( 1967 ), who administered a battery of 30 tests to 335 university 
students. All test, were chosen to measure abilities considered pertinent to 
mathematical ability. Data for the total group, for males only, and for females 
only were subjected to factor analysis by principal component procedures, for 
alllhree groups, Verbal, Numerical, Perceptual Speed,, Spatial Ability and 
General Reasoning factors were found. The General Reasoning factor, Ar h- 
metic Deductive, and Inductive Reasoning factors were isolated for oiales 
only. Although three additional reasoning factors emerged for females. Very 
found the factors difficult to define. 

The principal aim of the investigations by Werdelin (1958) was to 
analyze the structure of the problem-solving aspect of '"^ihcmatical ability^ 
Numerical, Verbal, Visual, Deductive, and General Mathematical Reason ng 
factors were found in both his Alpha and Beta studies. After reanalysis of the 
data in 1966, Werdelin commented: 

Problem solving in mathematics depends primarily on the general rea- 
soning factor R, according to the results of this study. Only to a some- 
what smaller extent are factors like the deductive reasoning factor D and 
the numerical factor N of importance. This is a result which is closely 
related to the very nature of mathematics problem solvmg. 
A problem is a task which involves several elements which shall be com- 
bined in ihc solution. The elements may be taken from various fields, 
such as the verbal one, the numerical one, the visual-perceptual one etc. 
Therefore, it is to be expected that these problems are loaded on the K 
factor as it is interpreted in the above. 

Our haviP- rotated the two studies to a common structure has enabled us 
to .„nfoni.\hc existence of the five factors. Furthermore, .1 has aided us 
in interpreting these. There are several questions which need to be fur- 
ther studied, however. The nature of factors like D and R is still little 
known and ihei- fields of dclinilion arc largely unknown. The numberof 
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laciors in the vlsuaUpercepiion field and the reasoning should be siiid- 
ied, and so on. The main result of ihe present study is pr()i)ably our 
having founded a |)latform on whieh lo build a larger structure, (p. 13) 

Other investigations of problem-solving structures were conducted by 
graduate students at the Catholic University of America (Campbell, 1956; 
bonohuc, 1957; Edwards, 1957; Emm, 1959; Engelhard, 1955; Kliebhan, 
1955: McTaggart, 1974). Tests of problem solving and other tests believed to 
be related to problem solving, were adniinisiered to groups of fifth-, sixth-, and 
seventh-grade males and females. Verbal and Arithmetic factors were identi- 
fied for each of the six groups, in addition, Campbell ( 1956) found, for sixth- 
grade males, a factor which involved comparison of data prior to problem solv- 
ing. Donohue ( 1957) found an Approarh-lo-Problem-Solving factor for sev- 
enth-grade males and females, Emm (1959) identified a Spatial factor for 
fifth-grade males, and McTaggari (1974) found another Verbal factor for 
fifth-grade females. 

These factor analytic studies of mathematical ability and problem solv- 
ing, as well as the CA A Project, suggested the existence of a stable intellectual 
structure of Verbal, Numerical, Rea.soning, Spatial, Perceptual Speed, and 
Memory factors. Fl{)w each of these factors related to mathematics achieve- 
ment was mu dear; bui, significantly, one of the reasoning factors of each of the 
mathematical ability and problem-solving studies had been determined pri- 
marily by mathematics tests, it was the purpose of the present factor analytic 
study to investigate the relationship between these stable intellectual factors 
and mathematical ability, particularly mathematical problem solving. 

Procedures 

Subjects 

The subjects of this study were 179 fourth-grade children from Wis- 
consin. Illinois, and New York. Participation was determined by: (a) cmoll- 
mcni in Dereloping Mathematical Av^o'i.u'.v (DMP), a K-6 elementary 
mathematics program developed [)y the Analysis of Mathematics Instruction 
Projeci of the Wisconsin Research and Development Center for Cognitive 
Learning ( Romberg, 1976; R{)mberg, Harvey, Moser, & Montgomery. 1974, 
1975, 1976). (b) fourth-grade level; (c) geographic area; and (d) willingness 
of principals and teachers to have their students included in the study. 

To ensure similarity in experiential background for the sample, the 
investigation was restricted to fourth-grade children who were studying 
DMP. .Since, at that time, only a few pilot schools were using DMP materials 
beyond fourth grade, it would have been difficult to procure a sample of 200 
children at any higher grade. The mathematical problem-solving test was 
designed for children in at least fourth grade. The geographic area constraint 
was primarily for the convenience of the investigator. 
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Instruments j r\f u 

Twenty tests were administered to the sample in this study. Ul these 
tests 19 were "reference" tests for iniellectual abilities and the remaming test 
was 'a mathematical problem-solving test constructed by Romberg and 
Wearne (Wearnc, 1976). Thr Romberg- Wearne test was designed to yield 
three scores: a comprehension score, an application score, and a problei.-.-solv- 
ini? score To accomplish this, the test was composed of groups of iiems called 
superitems: each of these superi.ems contained an item stem, a comprehension 
item an application item, and a problem-solving item. This Rombers-Wearn. 
problem-solving instrument and i 3 superitems wer^ described m Chapter 8. 
An example of a superitem is given here: 

A parking lot has room (II'^'" 
for 8 row of cars with 
9 cars parked in each 
of those rows. 



The parking lot h<;s room 
for the same number of 
cars in each of 8 rows. 

How many cars can be 
parked in the parking 
lot? 



(Comprehension item) 



(Application item) 



(Problem-solving item) 



In another parking lol, 
trucks are parked. Each 
truck takes the space ot 
3 cars. There are 12 
trucks in the parking 
lot and it is completely 
full. If there were 4 
rows in the parking lot, 
how many cars could be 
parked in each row? 

Although the primary objective of thii; study was to examine perform- 
ances of children in problem situations similar to those in the probk.n-solvmg 
questions of the Romberg- Wearne test, the test also provided information 
about prerequisite computation skills and mathemati^s concepts for the prob- 
Icm-solving questions. Therefore, the three measures of the Romberg- Wearne 
lest, a C:()mprehension score. Application score, and Problerr-solv .g score, 
were use<>' • all analyses of this study. 

Tabic 1 lists the other 19 tests administered to the sample. This table 
iruiii.ites the intellectual abilities hypothesized for the respective reference 
tests ATxr. ^ivrs the source of each test. Intellectual structures identified in past 
factor anaK tic studies of mathematical ability or mathematical problem solv- 
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Verbal 



NumeriQ^i 

Word flv40ncy 
Memory 



Perceptual 
speed 



Vci^^t^'^!/^"'^) '"'%on(i'^' A\dapi«" "'"fTi Waddle '"^ 
^%o\^ Ifica^, 'ovva Tests of Basic Skills 

f-^tu .zi^..:^ Constructed by CAA staff 

Constructed by CAA staff 
Sheridan Psychological Services 
Constructed by cAA s'*aff 



(6) 



^K-y >^ ^ " ' «jon»"w^»^ oy CAA s**"' 

(B)^^^tion (5) Constructed by Romberg 
W < 15) Constructed by CAA staff 

Constructed by CAA staff 

Constructed by CAA staff 
Iowa Tests of -^^j^. skills 

Constructed by CAA staff 
Constructed by CAA staff 

CTS Kit of Reference tests 
PMA 4-6 Test Battery (S^^A) 

Constructed by CAA staff 
PM A 4-6 Tes t Battery (SR^) 

l^Q tests. This 




order 



Attainment Abilities Proie;t (Harris 



& 



Harris. 1973). HarK, .. 

in«, .he O],^ K^c^ .^n, ^ Harris. ,973) sugg.,ied the hypothe- 
sised MruCtu/ V pr-^^ >- 

All b^S.v.^'^f lb*; ^.e ic.u'' '^^sts were ..elected from those used .n the 
CAA stuay. r,ca>f ihV'^r'p^l^stalt Cornpienon, Letter ClassiBcat.on, 
Number t:ia< P ;N. Hi,up nSusIoh. Number Series, Omelet P'^tu re 
Class \J<.-n,„r/' Clas,> •'^'^'■'"f ^ '"'^nd.s. Word Croup 

.he C^APr^LAr(^'^,^r^.\ l^^^-^- 
.S>»-«1 ..,.ri .S(^^.,i„>lnti'^ ..rch A i^^r'. «f ^he 
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die Tesl. 'I'hc one lesl used which was not in ihc CAA battery was Mathemat- 
'::oinputation (Romberg, 1975). The investigator attempted to select from 
the CAA battery those tests which she hypothesized were related to problem 
solving. Also, since this was to be a factor analytic study, at least two reference 
tests were included for each hypothesized ability. A brief description of each of 
the tests chosen is given in the next section. 

Description of Reference Tests for Cognitive Abilities 

1. Figure Matrix, In this test the subject is to infer two spatial rela- 
tions (across and down) and combine them. Then the figure that belongs in 
the cell with the question mark is selected from five choices. Example: 



o 


OO 


o 

oo 


o 


oo 




0 




• 



on 



o 



oo 



2. Gestalt Completion. This test involves naming an object from a par- 
tially obliterated picture. Example: 



1S4 



J Si 



3. Ideniical Picture. In this lest the subjcc i strlccls from five choices a 
fimift: identical to ^ given one. Example: 

A □ ^ZlBA 

CI CZ3 CD CD CZI 

4. lA'ttcr Classijicatwn. In each item of this test the subject is to infer a . 
class from three given examples. Then a fourth example of the class is selected 
from three choices. Example: 

A B C C 1. B A B C 
CDAA 2. ADBB 
erJAA 3. AACB 

5. PAathemr'ics CojTiputatwn (Romberg, 1975). This test consists of 
the follovving types of problems: addition, subtraction, place value, ordering, 
finding the missing number, and representing parts of a whole. 

6. Nurnher Clasujicattori. In this test, similar to Letter Classification, 
the subject is to examine the structure and form of three examples and infer a 
class to which all three belong. Then another example of that class is selected 
from five choices. 

7 Xumher Exclusion. This test parallels Number Classification, but 
the task required is exclusion rather than classification. Given four examples, ^ 
the subject is to infer a class that includes ihree of them, and to indicate the 
example that is excluded from that class. Example: 

A. 5 B. 75 c. 750 D. 885 

8. ^^^^^^if^er Series. Numbers forming a series are given in this test. The 
subject must infer a quantitative rule and indicate which of five choices would 
come next in the series. Example: 

2 8 14 20 A. 16 

B. 20 

C. 22 

D. 24 

E. 26 

9. Omelet. this test words are given with the letters in scrambled 
order. The subject is lo identify each word and spell the word correctly. 
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10. Perceptual Speed. This test requires circling two idenii' al pictures 
from fciur given fi(/urcs. Example: 

A B " 




II. I'tclurv Class Memory. In this test the subject studies 10 .sets of 3 
pii turrs. The three pictures in each set arc examples of a class. The subject 
infers the class, remembers it, and ther. judges whether or not 20 sets of - 
picture each belong to a class tha: was studied. Example: 




12. Picture Group Name Selection. In this test three pictured exam- 
ples of a class are given. The subject is to infer the class and select the best 
name for the class, f ^ample: 



All arc: 

A. animals 

B. brown animals 

C. dogs 



13. Remembermg Clusies: Members. For this test the subject studies 
10 sets of 3 words. Immediately following the study period, the subject indi- 
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rates whether or not each of 20 sets of 2 words belongs to a class that was 
siudicd. llrxarnplc: 

A. daisy 1. daisy 

rose pansy 

poppy II. daisy 

grass 

14. Remote Class Completion. In this test the subject is to produce a 
fourth word that goes with three given words. The given words all go together 
in some way, but the class is a remote one. Example: 



America 



eye 



hawk 



1 5. Seeing Trends. In each item of this test four examples are given. 
The subject infers a rule based on number of letters, alphabetic position of 
letters, etc. Using this rule, the subject places the word, given in parentheses, 
in its proper serial position. Example: 



all 



bov 



. cage. 



(dot) 



16. Spatial Relations. From four choices the subject chooses the Bgure 
that would complete a given figure to form a square. Example: 



B 



D 




17. Spelling. In each item of this test the subject is to select the mis- 
spelled word if there is one, or select ''no mistakes" if each of four words is 
spelled correctly. 

1 8. Vocabulary. In each item the subject is to select from four choices a 
synonym for the underlined word in a phrase. 

1 9. Word Group Naming. In each item of this test four examples of a 
class are given. The subject must supply a name for the class. Example: 

tepee beehive 

All are 

nest igloo 
Methodology 

One method for studying relationships between variables is intercorre- 
lation analysis. However, a large number of variables makes the task of ex- 
plaining all of the resulting intercorrelations nearly hopeless. Factor analysis 
pro-i^es techniques for summarizing relationships between variables, thereby 
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makiiiM inicrpmaiions easier. Butcher ( 1968) comnienied ahoui fad^"- 
analysis: 

This is a powerful mathematical teehnique for unravelling a complex 
pattern of overlapping iniluenccs, and is in many ways ideally suited to 
provide an answer to the questions that have been asked about the struc- 
ture of human abilities. Indeed, the views of psychologists at the present 
time have been more strongly influenced by the results of factor-analyz- 
ing test scores than by any other approach, (pp. 42-43) 
Since the primary aim of this study was to determine a structure of mathemati- 
cal problem-solving performance by investigating relationships among a large 
number of variables, factor analysis was deemed appropriate. 

Because factor analysts adhere to different theoretical bases, many fac- 
tor anaiytir procedures have emerged. Moreover, when these different meth- 
ods are ased with a given set of data, different factor structures may result. 
Because of this indeterminacy, the conservative approach to factor analysis of 
Harris and Marri.s (1973) was used in this study. 

Harris and Harris (1973) used three initial factor methods: Alpha 
(Kaist-r & Caffrcy, 1965); Harris R - S= (Harris, 1962); and Unrestricted 
Maximum Likelihood Factor Analysis (UMLFA) (Joreskog, 1967). Kai- 
ser's normal varimax procedures (Kaiser, 1958) were used in the present 
study to obtain orthogonal solutions for each of the three initial solutions, for 
each of the sets of orthogonal ' jmmon factors, two derived oblique solutions. 
Independent Cluster and A'A Proportional to L, were derived by procedures 
used by Harris and Kaiser ( 1964). It was necessary to procure both oblique 
solutions for the data as it was impossible to predict which results would be 
more interpretable. 



Results 

Means, Standard Deviations, and Reliab'Uties 

The Generalized Item and Test Analysis Program (GITAP) (Baker, 
1969) was used to obtain means, standard deviations, and Hoyt analysis of 
variance rellabiiiiy estimates for each of the 19 reference tests for intellectual 
;ihiliiic.s and the ihrrtr [)ans of the Rombcrg-Wearne Mathematical Problem- 
solving 'IVsl. Tlirse Kiatisiics arc presented in Table 2. 

Tlic Moyi rclial)iliiy estimates for tlic reference tests were generally 
^(MJd. '[en of tlie estimates unr equal to or greater than .80; two estimate^ 
were greater than .90. The reliability of the Applieation part of tht ilomberg- 
Wearnc le.st was .69; however, the reliabilities of the Comprehension and 
Problem-solving parts were relatively low, .48 and .59, respectively. 
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Table 2 



Means, Standard Deviationsr and Reliability 
Estimates for Test Scores 





Test 


Items 


Mean 


Standard 
deviation 


noyi 
reliability 


1 


FiQurB Matrix 


20 


8.92 


3.91 


.74 


2 


Gestalf Completion 


20 


12.23 


3.66 


.75 


3 


lndentic*3l Picture 


48 


26.61 


9.19 


.95 


4 


Letter Classification 


20 


13.78 


3.37 


.72 


5 


Mathematics Computation 


54 


40.51 


8.12 


.89 


6 


Number Classification 


30 


24.07 




.91 


7 


Number Exclusion 


20 


13.88 


* nr» 


.81 


8 


Number Series 


20 


12.84 


A 15 


.81 


9 


Omelet 


20 


10.32 


5.01 


.88 


10 


Perceptual Speed 


40 


27.44 


6.58 


.89* 


11 


Picture Class Memory 


20 


15.45 


3.20 


.78 


12 


Picture Group Name Selection 


20 


12.08 


3.07 


.63 


13 


Remembering Classes: 


20 


13.85 


3.43 


,71 




Members 










14 


Remote Class Completion 


25 


12.75 


4.08 


.75 


15 


Seeing Trends 


20 


11.85 


3.79 


.73 


16 


Spatial Relations 


25 


15.94 


4.18 


.76 


17 


Spelling 


38 


14.-; 1 


6.95 


.87 


18 


Vocabulary 


38 


24.40 


7.28 


.89 


19 


Word Group Naming 


20 


12.18 


4.27 


.80 


20 


Comprehension 


19 


13.52 


2.41 


.48 


?1 


Application 


19 


9.72 


3.33 


.69 


22 


Problem Solving 


19 


3.47 


2.40 


.59 



Note. Number of subjects is 179. 

Single-battery Factor Analyses 

To examine the relationships between mathematical problem-solving 
performance and intellectual abilities, the 19 intellectual ability measures and 
three problem-solving s >re.s were combined into one matrix for single-battery 
factor analyses. The intercorrelations of these 22 variables (Matrix B) are 
given in Table 3. 

After finding orthogonal and oblique rotations of the Alpha, Harris 
R-S^ and Unrestricted Maximum Likelihood initial factor solutions of Ma- 
trix B, an interpretation strategy of Harris and Harris ( 1 971 ) was applied to 
the three orthogonal and three A'A Proportional to L oblique solutions. The 
A'A Proportional to L oblique solution was more easily interpreted for this 
particular data than was the Independent Cluster solution. 

This interpretation strategy attempts to determine factors that are ro- 
bust with respect to method — factors which tend to include the same variables 
acro.ss methods. A variable was considered relevant to a factor if it had a coeffi- 
cient greater than .30 (absolute) on that factor. A comparable common factor 
was defined as having two or more of the same relevant variables on at least 
four of the six derived solutions. 



189 



Tables 



Tests 

1 Figure Matrix 

2 Gestalt Completion 

3 Identical Picture 

4 Letter Classiiication 

5 Mathematics Completion 
6Numt>erClassiication 

7 Number Exclusion 

8 Number Series 

9 Omelet 

10 Perceptual Speed 

11 Picture Class Memory 



Members: 

14 Remote Class Completion 

15 Seeing Trends 

16 Spatial Relations 

17 Spelling 

18 Vocabulary 

19 Word Group Naming 

20 Comprehension 

21 Application 

22 Problem Solving 
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Thi- Harris and Harris interpretation siralcgy yielded six comparable 
conmion fiicinrs. Table 4 gives ilic loadings ol" the variables relevant to the 
respective eoni|)arabl<; a>rrini(>n factors. Those variables which had loadings 
greater than .30 on at least four of the derived solutions are given in capital 
letters. 

Comparable Common Factor 1 (B-CCF 1 ) appeared to be a Verbal 
faeior combining Word Fluency and Verbal Comprehension. Comparable 
Common Factor 2 (B-CCF 2) was classified as Induction of classes employing 
symbolic content, and Comparable Common Factor 3 (B-CCF 3) appeared to 
be a Numerical factor. Comparable Common Factor 4 (B-CCF 4) was read- 
ily identified as a Perceptual Speed factor, and Comparable Common Factor 5 
(B-CCF 5) was an Induction factor employing verbal semantic, pictorial se- 
mantic, or figural content. Last, Comparable Common Factor 6 (B-CCF 6) 
appeared to be a factor specific to mathematics. This factor was determined 
primarily by the three scores of the Romberg-Wearne Problem-solving Test. 

In addition. Application, Part II of the Romberg-Wearne Problem- 
solving Test» had small loadings on the orthogonal Harris R-S^ orthogonal 
UMLFA, and Alpha oblique solutions of B-CCF 1 (a Verbal factor). Com- 
prehension, Part i of the Problem-solving Test, had a small loading for the 
two derived UMLFA solutions of B-CCF 3 (a Numerical factor). Further- 
more, Application had small loadings for all three orthogonal solutions of 
B-CCF 5 (Induction), and Comprehensi^.n had one small loading for the Al- 
pha orthogonal solution of B-CCF 5. 

Frequency Responses for the Problem-sol viag Test 

Table 5 shows th^t compreher.sior. ci the information given in the item 
stem and mastery of the prerequiaite rviathematics concept or skill did not 
guarantee success with the Problem-solving question. For instance, 131 sub- 
jects appeared to comprehend data given in superitem 10 and 105 multiplied 
8X9 correctly, but only 19 found the correct answer to the Problem-solving 
question. 

Generally, the highest number of correct responses was on the Com- 
prehension questions, the next highest on the Application questions, and the 
lowest on the Problem-solving questions. Three exceptions were the Applica- 
tion and Problem-solving scores for superi terns 4, 7, and 17. In superitem 4, 
children confused the concepts of perimeter and area in the Application ques- 
tion. In superitem 7, a few children seemed to know the meaning of average, 
lull they could not compute the average of three numbers. The expression 
2(rO + 100 caused children to have difficulty with the Application question of 
superitem 17. 

Discussion and Conclusions 

The six comparable common factors ( Verbal, two Induction, Numeri- 
cal, Perceptual Speed, and General Mathematics) resembled the factors oi the 
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Table 4 

Comparable Common Factors lor Matrix B 

Orthogonal^ 



Test 




A H U A H U 



Co mparable Common Fac toLlii:ggM[ 

^ ^63 47 61 62 5 

9 Omelet 42 53 54 'i3 54 38 

14 Remote Completion gU gg 54 63 

17 Spelling 53 7^ 67 59 61 

18 Vocabular' 44 53 .5 53 41 34 

19 Word Grojp Naming 

2 Gestalt 4^ 37 

5 Mathematics Computation 35 

6 Number Classification 37 

7 Number Exclusion 

8 Number Series ]^ J, 37 

12 Picture Group Name Selection 

13 Remembering Classes: Members 

20 Comprehension ^2 32 

21 Application 

Co mparable Common Fa ctoiiigigCFjl 

, *. 43 34 38 39 

4 Letter Classification 47 5^ «^ 50 50 

6 Number Classification g3 55 63 

7 Number Exclusion 

1 Figure Matri. ^, 

3 Mathematics Computation ^2 38 

8 Number Series 
16 Spatial Relations 
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Table 4 (Continued) 
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hypoihcsized siruaure for this study, although there were differences. Thr 
hypothesized factors Word Fluency, Simple Visualization, and Memory were 
not isolated. 'I'he two reference tests for Word Fluency, which were Omelet 
and Spelling, helped to determine the Verbal factor for the sample. Gestalt 
Completion, one of the reference tests for Simple Visualization, had small 
loadings on the Alpha solutions of B-CCF 1 ( Verbal) and B-CCF 4 (Percep- 
tual Speed), Spatial Relations, the other reference test for Simple Visualiza- 
tion, and the two reference tests for a Memory factor, Picture Class Memory 
and Remembering Classes: Members, had significant loadings on all but one 
of the derived solutions of B-CCF 5. Therefore, induction si^-'tned to be more 
important than remembering .for the two Memory tests and more important 
than visualizing for the Spatial Relations test. 

Comprehension, Application, Problem Solving, and Mathematics 
Computation determined a General Mathematics factor. This factor resem- 
bled Very's Arithmetic Reasoning factor (1967), Werdelin's General Mathe- 
matical Reasoning factor ( 1958, 1966), and the Arithmetic factor identified by 
the scries of .studies conducted by graduate students at the Catholic University 
of America from 19S6-1959. In all of these studies, including the present one, 
a Factor specific to mathematics emerged. However, while in past studies this 
factor was determined by mathematical reasoning, it appeared to be deter- 
mined by mathematics concepts in the present study. 

, The loadings of Problem Solving, Part III of the Romberg-Wearne; 
lest, on both the orthogonal and oblique UMLFA solutions of B-CCF 3, sug- 
gested some relationship between Problem Solving and Numerical Ability. 
'I'he three small loadings of Application on B-C:CF 1 .suggested that applica- 
tions are related slightly to Verbal ability. The other small loadings of Com- 
prehension and Application are probably of little consequence. 

Further Analyses 

.Ml analyses for this .study were for males and females combined. Since 
the investigator was also interested in any sex-related differences in mathemat- 
ical problem-solving performance and intellectual .structure, the data were re- 
analyzed for males and females separately. 

The /-tests employed demonstrated significant sex-related differences 
for only two of the intellectual variables. Spatial Relations {p < .01) and 
Picture Group iWime Selection (/; < .03). Both were in favor of males. 
There were no significant sex-related differences for the three scares of the 
Komhert^-VVcarne Mathematical Problem-solving Test. However, factor ana- 
lytic prof cduR-s resulted in different structures for males and females. 

Five r(»rnpar,il)lc common factors were identified for males and six 
( oniparalilc ( orninnn I.k tors emerged for females. The factors for males were: 
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(a) Verbal Ability and Word Fluency, (b) Induction of classes employing 
symbolic or figural content, (c) Perceptual Speed, (d) Problem Solving, and 
(c) Mathematics Concepts. The factors for females included: (a) Verbal 
Ability, (b) Induction of classes employing pictorial, Ogural, or verbal content, 
(c) Numerical Ability, (d) Perceptual Speed, (e) a Fluency factor employing 
either words or numbers, and (f) General Mathematics. 

Significantly, the three measures from the problem-solving test re- 
sulted in two mathematics factors emerging for itiales and only one for females. 
The Problem-solving factor foi males was determined primarily by the Prob- 
lem-solving questions of the Romberg- Wearne test together with the reference 
tests, Gesialt and Omelet. The Comprehension and Application questions 
caused the fifth factor. Mathematics Concepts, to emerge for males. The Gen- 
eral Mathematics factor for females was determined primarily by all three 
problem-.solving measures and Mathematics Computation. 

Another factor. Numerical Ability, could be considered a mathematics 
factor for females. However, none of the problem-solving measures had signif- 
. icant loadings for this factor, which was caused by Number Series and Seeing 
Trends. 

One explanation for the sex difference in the number of comparable 
common factors determined by the three problem-solving scores on the Rom- 
berg- Wearne test was that females approached problem solving more sy.stem- 
atically. Their methods on the Problem-solving questions paralleled their ap- 
proaches to the Application questions. Males may have used algorithms and 
school achievement for the Application questions, but used more of a Gestalt 
approach for the Problem-solving questions. 

Summary 

Generalizability of the results of this study was limited by the non- 
random sample, the battery of reference tests, and the difficulty of the Prob- 
lem-.solving questions. The Problem-solving mean (Part III of the Romberg- 
Wearne test ) was only 3.47 and the standard deviation was 2.40. The range of 
correct responses was 0-13. Fewer than 8% of the sample had over 6 of the 19 
Problem-solving questions correct. 

Almost all reference tests were selected from a battery used lor (he 
CAA Project (Harris & Harris, 1973). The investigator attempted to select 
from these ''concept attainment" tests those she believed to be related to prob- 
lem .solving. The selected battery accounted for 44% of the variance of the 
Problem-solving questions, 62% of the variance of the Application questions, 
and 42% of the variance of the Comprehen.sion questions. Significantly, the 
variance of tht: mathematics concepts tests of the Cy\A study, accounted for by 
ihe complete [)aitcry of reference tests, ranged from .39 to .61. 
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The Problem-solving questions* of ihc present study appeared lo he 
highly related to the ''eoneept attainment tasks'' as were some of the mathe- 
matics concepts tests of the CAA study. The Applieation questions were more 
highly related to coneept attainment than were any of the mathematics con- 
cepts tests of the CAA study. 

In conclusion, the study suggested the following: 

1. Intellectual structures contain a factor specific to mathematics. 

2. Problem Solving appears to be related to Numerical Ability. 

3. Prerequisite mathematics skills and concepts are related to, and ac- 
count for, some of the variance of problem solving. However, knowing these 
skills and concepts does not guarantee successful problem solving. 



198 



195 



Chapter 10 

Sex, Visual Spatial Abilities, and 
Problem Solving 

Ann Schonberger 

This study v/as initiated in 1975, International Women's Year, during 
which time the altenlii^n oi the world was drawn to women's struggle for equal 
partkipation in all society's activities. Important to such equal participation is 
ihc -Jbility lo solve ^J'lSthematical problems. In the United States men far out- 
number women in 'occupations requiring high mathematical competence. 
Oniy hypotheses for causes of this imbalance exist. Some possible reasons are 
sex bias in career f'ov.nsc:ling, discrimination in admission to specialized 
schools, and differences in sex-role socialization. In addition, inherent differ- 
ences in mathematical^ ability have been suggested (Carnegie Commission on 
Higher Education, i973). Some have said that while girls may be more profi- 
cient in computation, boys excel at mathematical reasoning (Jarvis, 1964; 
Maccoby, 1966). If this is true, mathematical reasoning could be the "critical 
filter" (Sells, 1973) the scientific and technical job market, since in these 
occupations the appli^*^^^^" of m.r.hematics to problems is valued more highly 
than computational proficiency. 

^-nployment is not the only area in which women's equal participation 
wili rfcpc^nd on their ability to solve itthematical problems. As consumers of 
housing and transp<)rtation as well as food and clothing, women will need to 
solve practical mathematical problems. Women should also share equally in 
the intellectual activities of society: 

Solving problems is the specific achievement of intelligence, and intelli- 
gence is the specific gift of mankind: solving problems can be regarded as 
the most characteri'^ticaUy human activity. (Polya, 1962, p. v) 

h it true that mails are better solvers of mathematical problems than 
fem^iics? If s*> what other differences between men and women might be in- 
volved? An area of cognitive abilities which might be related is visual spatial 
abilities. T>».: development of scx-related differences in these abilities parallels 
or precedes the development of differences in mathematics achievement (Fen- 
ncma, 1974; Macccby & Jacklin, 1974). The use of charts, diagrams, and 
graphs in all branches of mathematics- certainly argues for the logic of this 
connection. Questions about sex-related differences in mathematics and in 
spatial abilities and the relationships between the two, as well as questions 
about the role of drawing diagrams as a link between the two abilities, pro- 
vided the impetus for this study. 

i9e . 



To establish ihc limits of this investigation some of ihe key terms in the 
qtiestions were defined. The author used Zalewski's (1974) subject-depen- 
tleni definition of a mathematical problem. 

A mathematical problem is a statement which meets three conditions: 

1 . The statement presents information and an objective whose answer 
is based on that information; 

2. The objective or answer can be found by translation of the informa- 
tion into mathematical terms or application of rules from mathe- 
matical areas such as arithmetic, algel)ra, logic, reasoning, geome- 
try, number theory or topology; and 

3. The individual attempting to answer the question or attain the 
objectives does not possess a memorized answer or an immeosate 
procedure, (pp. 4-5) 

The third part of this definition serves to dilTerentiate real problems from exer- 
cises, but introduces a difficulty in that a problem for one person may be 
merely an exercise for another. 

Mathematical problem solving is the process of attaining the objective 
specified in a mathematical problem. 

Mathematical problem-solving ability is the ability measured by a test of 
mathematical problems. 

The last definition, a functional one in terms of a test score, makes no assump- 
tions about the components or origins of this ability. 

In order tf) define the visual devices for this study, a system for catego- 
rixing external n-f)rcsrn(ations of problems was necessary. Incorporating ideas 
from fither caicgory-systems (Bruner, 1964; Heimer & Lottes, 1973), the 
following definition was used. 

A pictorial representation has physical characteristics thai ca/i be 
viewed, but not felt or manipulated independently of the medium in 
which it is presented. Pictorial representations of objects usually disre- 
gard some of the objects' attributes. 

The following definitions arc related to the mode of representation of a maihc- 
maiical or spatial problem. 

A diagram is a pictorial representation of information presented in a 
f)roblem or deduced from information in the problem. 

Visual spatial abilities art those measured by tests recognized in the field 
of cognitive abilities as spatial, whose stimuli are pictorial representa- 
tions. According to VVerdclin, an aspect common to all such tests is "the 
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ability to comprehend the visual organizcttion of the material and reor- 
ganize it*' (Werdelin, 1961, p. 77). 

A two-dimensional test of visual spatial ability is a test in which the 
stimuli are planar geometric fi2I*J'*es or pictorial representations in 
which one dimension has been ignored. 

A three-dimensional test of visual spatial ability is one in which the 
stimuli arc pictorial representations in which all three dimensions have 
been drawn in perspective. 

Background 

To clarify the issues involved in this .study, the literature on visual 
spatial abilities, mathematical problem-solving ability, and the relationships 
between the two types of abilities was reviewed. 

Visual Spatial Abilities 

The literature on spatial abilities deals with several questions relevant 
to this study. Is visual spatial ability a unitary trail or a cluster of several 
abilities? If there are several, how should the factors (called spatial factors or 
factors in this chapter) be described and what tests deGne them? Are there sex- 
related differences in the structure of the spatial factors or in performance on 
spatial ability tests? Do all researchers agree on what tests of spatial ability 
are? 

Factor analyses of spatial ability data gathered during World War II 
and afterward suggested two or three subfactors of visual spatial ability. 
Michael, Guilford, Fruchter, and Zimmerman (1957) synthesized previous 
research considering complexity of stimuli, amount of manipulation involved, 
movement of parts versus movement of the whole, the subject's body orienta- 
tion, and the relative importance of speed and power. Their synthesis was 
influential. The authors were active in writing the factor descriptions and se- 
lecting tests for the Kit of Reference Tests for Cognitive Factors (French, 
Ekstrom, & Price, 1969a, 1969b) developed under the auspices of the Educa- 
tional Testing Service and referred to in this chapter as the ETS Kit. The 
spatial factors described in the ETS Kit are basically those of the Michael et 
al. synthesis, and a number of later studies, such as the National Longitudinal 
Study of Mathematical Abilities (NLSMA) (Romberg & Wilson, 1969), 
followed that framework. For the.se reasons it will b? described in detail. 

The first factor. Spatial Relations and Orientation (SR-O), was de- 
scribed as the ability to comprehend the arrangement of elements within a 
visual stimulus pattern with the subject's body as a frame of reference. In 
SR-O tests parts of the figure remain related to each other in the same way, as 
the figure as a whole is moved into a different position. The items are usually 
quite easy and speed is often important. 
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The second larior was called Visualizaii^'^^ ^ J^*). {) ^^.^^s ih^' 
jccl is cxpcdcd lo menially nianipulaic one or " "^^^^is ol a coidi.S'- 

uraiion according to ex[)licil directions. 'The s^^^i^^^ ^Uv.^ ^ P^' ,.ec()gii'^-^ 
draw the new configuration. Stimuli arc gcne^**^^^^ '^^^^ e^^ i" 
and speed is usually less important. The cru^'^^^.^ ^V^^^^^%en Vz and 
SR-0 tests, according lo Michael et ai., is t!u»^ in^^^^-Q ^^^^^^^ fitrurc as a 
h-^fwlt? is rigidly transformed, whereas in Vz i^-^^^ ^ ^^^^r^!^.^^ ^^^okcn up i"^^* 
/)arl.\ and the parts are transformed. Kinestlieii^* ^"^*K?^Yy , j-*^ faclor, 
appeared to involve right-lefi discrimination; ^^^^^/^^ th(^. t^^-' for hav'^ "^'^ 
shown a relationship to mathematical prohlciii'^^*^^'*^^ '^^il; '^\l will noi 
discussed further. V * 

Ciuilford (1967) located these spatial fa<''^''*^/^ hij, , ,^(jinH'nsi'>'i'»' 
struciure-of-intcllrct model, in which each cfl* ^^'^^ ^'"^^niv. ^^'*^^'j(,r or ahHity 
tlrscrihcd hy an o[)eration on certain comeii^ ^^'^'^ ^"^P^y^'^ '**\,r»(luct. I he 
.SR-() factor was lahded Cognition of I'iguri^^ ''^^^^^'^^^A'-^^'^ ''^"^^ ^^''^ 
called Cognition of Figurai Transformations. ^''^'"*^^n^ .^ii'lforii. SK-O 
iind V/ tests dilfer only on the product dinK'H^'**"; '^n^j.^^ ^'^^ ^^^^ review of 
factor analytic studies further divided the spa^**' ^'^^'^Uv i,^^^ ;,nd three- 
dimensional caiegories. 

•fhc SR-O, Vz framework is not witho^^ I'^'^^^^U^s^ ^^.^v. and fac- 
tor analytic studies have not always shown both f^t^^rs. ^^*^^^jy of Swedish 
high school males VVerdelin ( 1958) investigate^ ^ () V'/ division 

and the dimensional division. Factor analysis ^ . . ^niy Jpatial factor 
with but a faint indication of an SR-O, Vz subdiv»^»on. j^' j^jin's subse- 
quent study (1961 ) of high school males and feni^'^i^^ ^%\^^'o( "'^'^"^^^ 
data alone yielded both an SR-O and a Vz fact^^^ '^^^^v ^ tthe combined 
data turned up only one spatial factor. Separate ^IJl^^^^'^iii ^^^^^^ females' data 
was not reported. Other studies have indicated d» ^''^m j. t*]^ factor struc- 
tures for males and females, although not neccsi*'^^'*^ K^^^^f SR"^^ 
factors (Harris & Harris, 1973; Very. 1967). ^''^^^'''l^ ^ ^^g"- ^^iid not fmd die 
SR-O, Vx subdivision even in an all-male saniP'**' ^J<)\v^^^ ^^'.^ ^^^^^ly indi- 
cated that subjects were using different meth<>''^^" ^^^^t j^^'^',,! items, some 
visual and some logical or analytical. Barrett ( ''^^J^^^^ itl^.^*'.(-p<>rted dilfer- 
enl .styles of solving itr:ms on live dilfercnt s[>a^''^' ^^^^N. ^so'^ 

In arldiiion to the scx-related difFereni'^*'^ ^'aq ^cwrc of vis- 
ual spatial abilities, se.\-related differences in '"''"p ^^'^ffi^l!,^ ^^^ce <>n spatial 
tests have been noted in a number of revie^^^ ^ /^^^k* "^^V4; Oarai & 
Scheinfeld, 1968; Maccoby, 1966; Smith, 196^^ '^"^^^^^ \^(^^ understand 
these differences better, one should know at wha^ '^^^^ Uti^ " )• ^^^^i l^inds of 
tests they have been observed. Studies of preadt>'*^'^^*^*Hjs p^^^*^^ \j ^ince 1965 
show few sex- related difFcrcnces ( Anglin, Meyc^' ^ ^"^^1:1*^075: Ma^'^'^^^V 
& Jacklin, 1974), but sonic have appeared (Ha^^J^ ^^^\^^'^' ^973) on two- 
dimensional SR-O, Vz tests. Scx-related dilTere'i*'^'^ ^«;C(,j^ ^^i.s, ^^p^irent in 



adc.lescenl groups. Three large-scale studies found males' pcrfoniiance supe- 
rior to females' on a three-dimensional Vz test of the surface development type 
(Bennett, .Seashore, & Wesman, 1073; Droege, 1967; Flanagan, Davis, Dai- 
ley, ShaycToft, Orr, Goldberg, & Neyman, 1964). Others have reported sig- 
nificantly higher means for males on tests resembling three-dimensional Vz 
tests(Bock & Kolakowski, 1973; StalFord, 1961 ). There is also some evidence 
of higher male performance on two-dimensional SR-O tests (Flanagan et al., 
1964; H'obson, 1947; Thurstone, 1958). Two studies using a variety of spatial 
tests with college subjects reported sex-related differences in favor of males 
(Sherman, 1974; Very, 1967). 

The last question posed about visual spatial abilities was whether or 
not researchers agree as to what tests are visual and spatial. Some tests which 
do not Gt into the SR-O, Vz classification .imJvc aural or tactile perception, 
mechanical knowledge, or motor skills. Since these have shown little relation- 
ship to the ability to solve mathematical problems they were disregarded in 
this study. On the other hand, there is a group of tests whose classification as 
spatial has not always been recognized, but whose relationship to mathemati- 
cal problem solving has often been observed. These tests, which require the 
subject to pick a simple figure out of a more complex stimulus pattern, have 
been called Gottschaldt's Figures, Concealed Figures, Embedded Figures, 
Hidden Figures, and Hidden Patterns (see Thurstone & Jeffrey, 1956). The 
corresponding ability has been named Gestalt Flexibility, Flexibility of Clo- 
sure, or Convergent Production of Figural Transformations. It has been re- 
garded as a cognitive style rather than an ability and labeled Field Indepen- 
dent-Dependence. 

That these tests should be classified as spatial was argued by Sherman 
(1967) and supported by her own research (1974). Maccoby and Jacklin 
( 1974) followed She* man in considering the tests spatial, a change from Mac- 
cobys ( 1966 ) categorizing them as measures of field independence. Other evi- 
dence of a spatial component comes from the ETS Kit manual ( French ct al., 
1969a, 1969b), from French's (1965) study, and from Guilford's .synthesis 
( 1 967 ). The nature and size of the spatial component as well as the identity of 
its other components is unresolved at present. Also, there appear to be difi'erent 
styles of solving items of this type (French, 1965). For these reasons this au- 
thor prefers to call them tests of visual disembedding, a descriptive term that 
makes the fewest assumptions about the underlying cognitive processes, 

'I'he demonstrated relationship of these tests to mathematical problem 
solvinj^, as well as the significantly better performance by males on such tests, 
were the rea.sons for this study's concern with tests of visual disembedding. 
Scx-related differences have appeared in a number of studies summarized by 
Witkin. Dyke, Faterson, Goodenough, and Karp (1962) and have been the 
sul)jeci of a great deal of controversy, although the differences are generally of 
sni.ill magnitude (Kagan & Kagan, 1970). In fewer than half of the more 
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rcccnl studies reviewed by Maccoby and Jacklin ( 1974) did sex-relaied (iiller- 
cnces a[)peai ihvvc was some indiralion liiai such ditVercnres paralleled iliose 
in oihcr spaiial lesis both in mat^niiude and in lirne of appearance. 

In summary, i I appears ihai spaiial icsis can be categorized as SR-O or 
Vz tests with addiiional subdivision based on dimension. Tests of visual dis- 
embedding can also be considered spaiial alihough this has noi always been 
accepicd. There is some indication thai ihe structure c)f ihe spaiial factor is 
difi'crent for males and females. Sex- related differences in performance in 
favor of males. a[)pearing in adolescence, have been found in a nurnlj-.T of 
studies, especially on three-dimensional Vz tests. Even the largest of these dif- 
ferences in means are usually less than half a standard deviation so the within- 
sex variation is dcfmitely greater than the bctween-scx variation. 

Mathematical Problem-solving Ability 

One might consider items from most tyf)es of s[);itial tcsis to l)e 
[Moblrnis in transforrijational geometry. Are women sirnilarly handicapped in 
solving all other tyf)cs of mathematical prolilcms? If so» at what age do males 
begin 10 oul()erform women in mathematical problem solving? To answer the 
latter question studies were grouped as follows: elementary, grades 7 and 8. 
grades 9 through 12, college, and adult. 

To qualify f()r inclusion in this review a study must have used test 
items ir?ended to measure mathematical abilities other than computation and 
test items which seemed to this reviewer to satisfy the definition of a mathe- 
matical problem given in the introduction. For example, studies of mathemati- 
cal reasoning were often included. Selecting studies involved judgment because 
of the definition's requirement that the individual attempting to answer the 
question mu.st not possess a memorized ^mswer or an immediate procedure. 
The general policy was to include doubtful studies. 

Several of the studies which contributed the most to this review are 
longitudinal and a word needs to be said about their methodology. One is the 
National Longitudinal Study of Mathematical Abilities (NLSMA) (Rom- 
berg & Wilson, 1969). f)robabIy the most intensive and extensive stu<ly in this 
area. Three rlilTerent groups were tested: one in grades 4 through 8, aiKJtfier in 
grades 7 through II. and the third in grades 10 through 12. A content 
( numfjcr systems, geometry, algebra) by level of behavior (computation, com- 
prehension. a[)[)lication, analysis) matrix was used to categorize the mathe- 
matiis Scales. Hoth the ap[)lication and analysis scales are included in this 
review, hut the definition of analysis items seems closer to this study's defini- 
tion of a mathematical [problem. The NLSMA study, designed to compare 
certain textbook series, involved primarily college-capable students. Another 
factor which should b(^ considered in evaluating the results is that the sex- 
related (lifFercncrs reported were those remaining after removal of the variance 
due to verbal and nonverbal IQand mathematics achievement. A second longi- 
tudinal study (Hilton & Berglund, 1974). whose results are reviewed, niea- 
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sured the same students in grades 5» 7, 9, and 1 1 using the Sequential "lest of 
Educational Progress-Mathematics (STEP-Math) (Cooperative Test Divi- 
sion, 1956-72) which those authors regarded as a measure of the ability to 
apply skills to problem solving. The sample was divided into an academic 
group and a nonacademie group according to what program they eventually 
pursued in high school, and results were analyzed by group. 

In the NLSMA study of grades 4 through 6, boys excelled on two out 
of three application scales, both concerned with number systems, and on the 
only analysis scale, a geometry scale (Carry & Weaver, 1969). Hilton and 
Berglund (1974) found no signiGcant differences between girls and boys in 
either group on STEP-Math. In a study using fifth-grade subjects ( Harris & 
Harris, 1973), no sex-related differences were found on either of two cognitive 
abilities tests containing mathematical problems. Similarly, no differences be- 
tween boys' and girls' performance on an arithmetic reasoning test were found 
by Parsley, Powell, O'Connor, and Deulsch (1963). A second study (1964) 
by Parsley, Powell, and O'Connor indicated better performance by males in 
1 2 subgroups and by females in seven subgroups out of a total of 75 compari- 
sons. In a study of si.xth-grade students Jarvis (1964) found that boys of all 
ability levels surpassed girls in arithmetic reasoning. Clearly, although some 
differences have begun to appear in upper elementary school, the results are 
mixed. 

Sex-related differences were more apparent in the studies reviewer^ us- 
ing seventh- and eighth-grade students. Hilton and Berglund (1974) reported 
a difference in favor of boys on STEP-Math in the academic group. The 
NLSMA also gave S'T^KP-Math to one group in seventh grade, categorizing it 
as an application test, and found boys' performance to be superior (McLeod 
& Kilpatrick, 1969). Sex-related differences in favor of boys were also found 
on all t)ut rme of the analysis scale.s and on the one application scale designed 
by NLSMA (Carry, 1970; McLeod & KilpatricK, 1969). The content of the 
scales on which differences were found was number systems and geometry; the 
scale on which none were found was an algebra scale. In a study of problem- 
solving styles in high-ability, eighth-grade subjects Kilpatrick (1967) found 
that although scores for boys and girls were about the same, girls used signiO- 
cantly more deduction and more equations. In the National Assessment of 
Educational Progress (NAEP) consumer math skills were measured by a test 
of problems given to 13-year-olds, 17-year-olds, and young adults, ages 26 to 
35. In the youngest group the boys' median was one and one-half percent 
above the median of the total group and the girls' median was one and one-half 
percent below (Ahmann, 1975). 

With the exception of the NAEP all the studies discussed in this sec- 
tion in which sex-related differences were observed were conducted with stu- 
dents of above-average ability. There is another indication that overall superi- 
ority of hoys in mathematical problem solving in grades 7 and 8 may be due to 
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superior performance by boys of high ability. In a study of mathem.*ilical pre- 
cocity Stanley, Keatinij, and Fox (1974) found that in a sample of :;cventh- 
and eighth-grade students who volunteered for screening with the Scholastic 
Aptitude Test-Quantitative (SAT-Q) boys far outperformed girls and the 
discrepancies increased with age. 

Surveying the studies of high school students required additional cau- 
tion because required mathematics courses are often tracked and mathematics 
becomes elective in the upper grades. Good examples of this lack of control for 
number or type of mathematics courses taken are the Project Talcru Survey 
(Flanagan et al,, 1964) and the NAEP (Ahmann, 1975) both of which found 
sex-related differences in favor of males. In all the high school studies reviewed 
here the studem.s were in the same class or track when tested. 

Inform. iiion on sex-rclatcd dilfcrcnces in ihc NL.SMA were reported 
only for the college fjreparatory group. At the a[)()IicaUons level boys in grades 
9 through 1 1 cxcelletl over girls on five of 1 2 geometry scales and one algebra 
scale. Al the analysis level the boys' performance was superior on half the 
algebra and number systems scales; on the geometry analysis scales boys ex- 
celled on .six of the eight and girls on two (Kilpatrick & McLeod, 1971a, 
1971b; McLeod & Kilpatrick, 1971; Wilson, 1972a, 1972b). The impression 
of overwhelming evidence of male superiority on NLSMA mathematical 
problem-solving tests should be tempered by several limitations of the study. 
The restriction to upper-ability students was more severe in the high school 
data than in the junior high data. The statistical removal of variance due to 
verbal and nonverbal IQ and mathematics achievement may have left only a 
small fraction of the variance. Application of the CO .statistic (Hays, 1973) to 
three of the analysis scales given in grade 1 1 showed that on each, less than one 
percent of the variance was due to sex. Sex-related differences in performance 
on the analysis scales appeared most pronounced in the area of geometry, 
which may be related to males* advantage on spatial items. One of the two 
geometry scales on which girls excelled was Structure of Proof, which ap- 
peared to require vcrl)al rather than spatial skills. Finally, the content of the 
number systems f)roblems for grades 4 through 11 should be considered. 
Among these were all the problems about people. In virtually all cases in 
which sex of a person was specified, the person was male. 

Evidence of the importance of these issues was found in other high 
school studies. In the Hilton and Berglund (1974) study boys from the 
academic group scored signiBcantly higher on STEP-Math in grades 9 and 1 1 , 
whereas in the nonacademic groups boys scored higher only in the eleventh 
grade. In a study of problem solving in ninth-grade algebra Sheehan ( 1968) 
changed a slight (but nonsignificant) advantage of girls into a signiGcant dif- 
ference in favor of boys by statistically removing variance due to algebra apti- 
tude and previous mathematics achievement and knowledge of algebra. In his 
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study of higli-ability high school students Wcrdclin (1961 ) found scx-rclated 
diflcrcnces limited to two tests of geometrical problems. 

Studies of college students and adults are even more open to criticism 
for lack of control for previous exposure to mathematics. Very's (1967) study 
and the NAEP, both of which found males to be better problem solvers, can be 
criticized on this point. The most signiBcant body of research on mathematical 
problem solving in college students is a group of interrelated studies done first 
al Stanford and then at Yale. After Sweeney's ( 1953) study which found sex- 
related differences in addition to those due to intellectual factors, subsequent 
studies investigated various noncognitive sources of the difference. Carey 
(1955) found attitude toward problem solving to be a signiGcant factor in 
males' better performance on the problem test. Following a treatment designed 
to improve attitude, women's problem-solving performance improved signifi- 
cantly, whereas men's did not. Berry (1958, 1959) and Milton (1957, 1958) 
investigated the relationship between the Terman-Miles masculinity-femi- 
ninity index and ability to solve mathematics problems similar to those used, by 
Carey ( 1955) and Sweeney ( 1953), In only one of the four studies was the 
correlation significant after removing effects due to verbal and quantitative 
factors. In the 1 959 study Berry used a number of other noncognitive measures 
and found that the only ones contributing to the remaining problem-solving 
variance were two tests of visual disembedding and Carey's attitude test — 
and this only for males. Milton investigated the effects of problem content and 
found men superior at solving "masculine" but not "feminine" problems. 
(Needless to say, the sex-role stereotyping was incredible.) Hoffman and 
Maier ( 1 966) also investigated the area of problem content but found no sig- 
nificant sex differences. 

Summarizing the research on sex-related differences in solving mathe- 
matical problems is difficult. As was the case with visual spatial abilities the 
differences may be small, but they do seem to exist, even after controlling for 
mathematics background. As with spatial abilities the differences seem to ap- 
pear in early adolescence and may increase with age until maturity. The stud- 
ies reviewed in this section indicate that the sex-related differences may be 
limited to the upper-ability level and to problems whose content is spatial or 
sex biased. 

Relationships between Solving Mathematical and Spatial Problems 

The fact that sex-related differences in both visual spatial abilities and 
mathematical problem-solving ability begin to appear in the upper elementary 
grades and develop throughout adolescence suggests that there is some rela- 
tionship between the two abilities (Fennema, 1974). Anglin, Meyer, and 
Wheeler ( 1975) and Smith (1964) hypothesized that the importance of spa- 
tial ability increases with the cognitive complexity of the mathematical task. In 
this section the relationship between the two abilities, as seen in several stud- 
ies, is reviewed with these questions in mind: 
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1 . VVIiai is the evidence of a rclati()nshi[> hetwcen mathematical [)riil)- 
Icni solvinij; and visual spatial abilities':* 

2. is the relationship diflerent for males and for I'eniales? 

3 If m(jre than one measure of spaiial skills was given, are the rela- 
tionships with the test of ma'henialical problem solving different for 
different spatial tests? 

4. II" more than one problem-solving measure was used, are the rela- 
lionships with the spatial tests different for different problem-solv- 
ing measures? 

To examine the author's hunch that the common element of spatial 
ability and mathematical f)roblrm-s()lving tests is actually figural reasoning, 
(()mparisons with these tests were made whenever the data were available. As 
in previous sections, the discussion is organized by grade level. 

In the upper elementary grades the CAA Project (Harris & Harris, 
1973) provided information on the relationship among spatial abilities, figural 
reasoning, and mathematical problem solving. In l)oth years of the study, all 
the ((jrrelation coefficients between pairs of these tests were significant. The 
spatial tests and the mathematic'd problems tests were more closely related to 
the figural reasoning tests than to each other, which suggests that figural rea- 
soning may be a bridge between the two. Only for the group in which l)oys had 
out scored gir':, on the spatial test were there any sex-related differences in the ' 
correlations. In that group tne spatial test was more closely related to the 
mathematical problems test for girLs than for boys. In both this study and the 
Anglin et al. ( 197!^) study Vz tests were more closely related to mathematical 
problem solving than were SR-O tests. 

With the NLSMA data for grades 5 through 1 1 there are two ways to 
investigate the relationship between the spatial tests and the analysis or ;'.ppli- 
(ation measures: one involves correlation cocllicients .'t. J the other involves 
analysis of variance. The correlations were made among tests which had been 
given in different years and were generally low. Despite ihe probability that 
none of the diffeienccs l)etween correlation coefficients were statistically signil- 
i(aiit. some interesting patterns can be ol)served. A two-dimensional V'z lest 
w;is a oetter predictor of all the analysis scales than either a two-dimensional 
.SR -() test or a test of visual disembcdding. At each grade level the spatial tests 
w» <■ more highly related to the geometry scales than to the algebra or number 
systems scales. l*he correlation coefficients decreased with age, probably be- 
cause mathematical problem solving at upper levels requires more specific 
n)alhematical knowledge. The correlations of other mathematics scales in the 
NLMMA with the spatial tests were no higher and usually lower th.in those of 
the analysis scales in all but a few isolated cases (Wilson &. Begle, 1972b). 
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Supporting information on the relationships between the analysis or 
application scales and the spatial tests was generated by two-way analyses of 
variance done separately by sex for each pair of tests (Wilson & Begle, 
1972a). Significant main effects of the spatial variables on th'e mathematics 
measures were found more often for Vz tests than SR-O tests and on geometry 
scales more often than number series or algebra scales. There appeared to be 
no pattern in the differences for males and fernalcs. Dodson (1972) used a 
discriminant analysis of a subset of the eleventh grade NLSMA data to char- 
acterize successful problem solvers. A test of visual disembedding discrimi- 
nated amon" levels of the total problem test and the geometry and number 
systems subtests; it was not related to the algebra subscale. A two-dimensional 
SR-O test discriminated among levels of the total test a.id the geometry subtest 
and less significantly among levels of the other two subtests. 

Werdelin's ( 1958, 1961 ) factor analytic studies also demonstrated the 
close relationship between spatial and problem-solving abilities. Mathemati- 
cal problem-solving tests, especially geometry or number series tests, loaded on 
the spatial factors. Two-dimensional Vz tests and three-dimensional spatial 
tests of both iypes loaded on Reasoning factors which included all the mathe- 
matical prol)lcms tests. In the Project Talent study (Flanagan et al., 1964) the 
thrtc-dimcnsional Vz tests accounted for more of the variance on each mathe- 
matics test than did the two-dimensional SR-O test. There appeared to be no 
significant sex-related differences in the relationships. As in the CAA Project 
study (Harris & Harris, 1973) the figural reasoning test was related more 
closely to both the spatial tests and the mathematical problems test than the 
two were to each other. 

The studies by Berry (1958, 1959) and Sweeney (1953) give some 
information on the relationship between spatial and mathematical abilities in 
college students. Sweeney found that matching on performance on a two- 
dimensional SR-O test was as effective as matching on years of mathematics 
taken in removing or reducing sex- related differences in performance on his 
problem-solving tests. Berry found a test of visual disembedding was almost as 
clo-scly related to his tests of mathematical problem solving as was the SAT-Q. 

Thus, in these correlational studies, visual spatial abilities appeared to 
account* at least as much as any other type of cognitive ability, for part of the 
variance in mathematical problem solving. The one exception occurred when 
lest' '.)f figural rea.soning were included; these te<Jts were more clo.sely related to 
}K)th spatial and problem-solving tests than the latter two were to each other. 
Spatial tests were related to problems with different content in this decreasing 
order: geometry; practical situations or arithmetic; algebra. Tests of the Vz 
factor were more closely related to mathematical problem solving than SR-O 
tests; tests of vi.sual disem[)cdding may fall somewhere in between. Whether or 
not there are sex-related differences in the relationships is unclear. 
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Correlational studies do nol give evidence of cause and effect, but il is 
usually assumed In ihe literature that spatial al)iliiy is somehow more funda- 
mental than the ability to solve mathematical problems, which involves oiher 
components as well, 'I o investigate how spaiial skills arc used to solve niaihe- 
matical problems one has to turn to introspective as well as experimental re- 
search. Ha'lamard\s (1054) account of his own and FLinstein's thinking sug- 
gests thai the role played by imagery was to record ihc relationships or 
patterns among ihe elements of the problem and to facilitate combining ihe 
elements into new patterns. Poincarc ( 1929) noted that there were individual 
dilFerences in ihc use of visual imagery among mathematicians, irrespective of 
problem content. Menchinskaya ( 1946) also described ihis variation among 
ordinary people solving problems. 

In addition to menial images, anoihcr aid to [Hoblcm solving is dia- 
grams ~ visual images cxicrnali/rd on pa[)cr or chalkboard. 'I'wo studies on 
pn)l)lcni solving in gcomelry (.Sherrill, 197.^; Webb & Sberrill, 1974) have 
shown the importance of a correct diagram. Botsmanova (1960) noted ihc 
value of pictures illustrating the mathematic* 1 stiUi:*Mres of arithmetic 
problen..>. 'I'wo important skills in using diagrams for problem solving seem to 
be picking out a simple figure from a complex diagram, or visual diniscmbed- 
(ling ( Hadamard, 1954; Yakimanskaya, 1970) and recognizing an element of 
a problem's diagram as the transformed image of a learned theorem's diagram 
( Kabaiiova-Meller, 1 970 ) , Also important is the ability lo represent the infor- 
mation given in a problem by drawing a diagram. There is some evidence that 
females do less well than males at drawing dii»grams ( Boe, 1968; Mitch- 
elmore, 1975). 

In sunmiary it seems that visual images are only one method that may 
be used in .solving mathematical problems to record the relationships among 
elements of the |)roblem. Some problem .solvers visually transform these ele- 
ments into new combinations to arrive at a solution. Others rely more on ver- 
bal or mathematical symbols to represent and transform the information logi- 
cally, sometimes with great success. However, Werdelin (196! ) pointed out 
that f)eoplc who have both visual and verbal methods available are more likely 
to solve problems su<(essfully than tho.se with only one method at hand. 

Designing and Carrying Out the Study 

None of the studies discussed in the review of research examined corre- 
lations between |)r()blem-sblving performance and a full range of visual spatial 
.d)ilily measures. Also, different types of problems were either not eon.sidered 
at all or c(msidered only after the fact. Very little research on use of diagrams 
was found, 'I'hus, by providing partial answers to the questicms posed in the 
introduction, the review makes it possible to replace them with more specific 
hypotheses. 
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H I . Boys and girls do not differ in their ability to solve malhemali- 
cal problems. 

H2. Ther*' are no sex-rclated differences in performance on mea- 
sures of any of the visual spatial abilities; two- or three-dimen- 
sional SR-O or Vz visual disembedding. 

H3a. There are signi6cant positive relationships between ^.ach of the 
visual spatial abilities and ma»,heniatical problem-solving 
ability. 

H3b. The relationships are stronger for Vz tests than for SR-O tests 
and stronger for three-dimensional tests than for two-dimen- 
sional tests. 

H3c. These relationships do not differ by sex. 

H4. Each type of visual spatial ability is more closely related to 
solving mathematical problems with high spatial content than 
those with little spatial content. 

H5a. Boys and girls do not differ in their use of diagrams in solving 
mathematical problems. 

H5b. Use of a diagram in solving a mathematical problem is posi- 
tively related both to the ability to solve that problem and to 
visual spatial abilities. 

H.Sc. There are no sex-related differences in these relationships. 

The review of research indicates that sex-related differences in visual 
spatial abilities as well as in mathematics achievement begin to appear in 
grades 6, 7» and 8. For this reason and to avoid the complications caused by 
different course offerings in mathematics in high school, junior high school 
students were used as subjects in this study. The entire seventh-grade class of 
the Fifth Street Junior High School in Bangor, Maine, was selected as the 
sample. Bangor's population is among the most heterogeneous in the state and 
its neighborhoods are small enough so that each junior high school encom- 
pa.sses a number of socioeconomic levels. Its population is fairly conservative 
with respect to sex roles but a citizens* committee had been studying sex roles 
in the public schools and reporting to the School Committee. Of the three 
junior high schools in Bangor, Fifth Street was chosen as the most representa- 
tive by the director of testing and research for the school system because it was 
always the median. Virtually ail of the students in the sample were white and 
spoke English as a 6rst language. The choice of seventh grade rather than 
eighth grade was made to avoid comjjlicalions because some eighth -grade stu- 
dents in this school take two semesters of algebra, some one semester, and some 
none at all, T'he seventh-grade mathematics classes were tracked into two 
levels (Level 1 being the upper one), but 'essentially the same material was 
taught in each track. Data were collected in late spring of 1975. 

A number of different considerations entered into the choice of tests of 
visual spatial abilities. They had to be short, easily scored, paper-and-pencil 
tests. Whenever advisable, tests in related studies were used to facilitate com- 
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parison ol' results. To investigate both the SR-O, Vz division and iUr ditncn- 
sional division, a i wo-hy-iwo matrix was constructed and a test chosen lor cadi 
cell. (See Figure I.) Both two-dimensional spatial tests. Card Rotations and 
I'orm Board 2. were chosen from the ETS Kit tests modified for use by the 
NLSMA (Wilson Cahen, & Bcgle, 1968d). The three-dimensional SR-O 
test was Cubes Comparison, an ETS Kit test (French ci al., 1969a. 1969b), 
The three-dimensional Vz test was the Difl'erential Aptitude 'lest (DA'F) 
Space Relations (Bennett, Seashore, & Wesman, 1972). The test of visual 
disembedding chosen was Hidden Figures 2, also an ETS Kit lest modified by 
NLSMA (Wilson et a!., 1968d). This is a two-dimensional test; the author 
knows of no three-dimensional test of this factor. 
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Ftgure 1. A matrix of spatial tests used in this study. 

Choosing a valid and reliable written test of mathematical problem 
solving for the seventh-grade students was a problem. A decision to base the 
test on problems from Zalewski's (1974) wriiicn-icsi item l)ank was made for 
several reasons. In his study the written test did not quite account for 50% ol 
the variance on the interview test but it came cIosc» so concurrent validity was 
considerable. Content validity appeared to be at least as high as that of com- 
mercial tests. Since this study was designed to include dilfercnt types of 
problems as one of its dimensions, selecting problems from a pool was prefera- 
ble to using an intact test. 

In order to investigate the relationships between sex and visual spatial 
abilities with problems dilTering in amount of spatial content, three categories 
were eslablished. 

A. Problems in which the stimulus (presentation of iUc problem) is 
piiriiy pictorial or which require s()alial or geomeirie skills or 
knowledge for solution. 

B. Frol)lems with a completely verbal stimulus in which spatial skills 
(such as visualizing the situation or drawing a diagram) may be 
useful but are not necessary for solution. 

C. Problems which appear to have no spatial content. (In other 
words, any problems that do not fit into categories A or B.) 
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Four mathematics teachers, tvvo male and two female, classified the entire 
written test item pool into these categories. A reduced pool was formed of those 
problems assigned to the same category by all four judges. Some problems 
were eliminated because they were quite similar to items on thf. spatial tests or 
because their content was judged unfamiliar to the subjects of the study by 
their teachers. Finally, eight problems from each category were chosen from 
the reinaining f>ool. Since scx-related differences were to be investigated in this 
study, it seemed appropriate to word the problems to control for sex bias. 
Whenever possible the problem was made neuter, such as by replacing ''boys" 
or ''girls" with "students." Where this was not possible names and pronouns 
were adjusted so that there were equal numbers of male-acted and female- 
acted problems m each category. 

All these measures were pilot tested with classroom-sized samples at 
another Junior high school m Bangor to see if the tests were appropriate for 
seventh-grade students and to check the time necessary for admini.stering the 
tests. Reliabilities for the spatial tests ranged from .47 to .89. The coefficients 
were not impressive but not much lower than those reported in the literature 
for some eighth-grade groups. Also the pilot samples were small (16 to 24 
students). Some additions were made to the instructions to ensure that the 
.subjects in the main study would understand the tasks. Since the problem test 
had been newly constructed for this study it was examined item by item after 
the pilot test. Two items were replaced by different ones. 

In the main study ^^^^ were gathered from 176 subjects in three differ- 
ent testing sessions with makeup tests given a few days later. In the end there 
were very few missing scores: four subjects missed session 1, none missed ses- 
.sion 2, and three missed session 3. The guidance office supplied information ojj 
sex of students and IQin stanines measured by the Otis-Lennon Form J (Otis 
& Lennon, 1970) given in the fall of the sixth grade. Level of mathematics 
class was supplied by the subjects. The five mathematics teachers who taught 
seventh grade were interviewed to provide additional information about the 
subjects* mathematics programs in the year they were tested. The most impor- 
tant fact provided ia these interviews was that three of the four Level 2 
(lower) clas.ses had had no geometry that year, although the teachers thought 
that their students would be familiar with the geometric concepts used in the 
problem-solving test. 

After the data were gathered, the tests were scored. Card Rotations 
and Cubes Comparison (French ct al., 1969a, 1969b) were scored using the 
number right minus the number wrong formula. For the other three spatial 
tests (Form Board 2, Hidden Figures 2 (Wilson et al., I968d), and DAT 
Space Relations (Bennett, Seashore, & Wesman, 1972)] , the score was the 
number correct. Four types of .scales were used for the test of mathematical 
prr)hlem solving. Each of these was applied to the total set of 24 problems and 
to each of the subtests generated by categorizing the problems A, B, or C. One 
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scale indicaied the Mum[)cr of correct answers in each category and on ihc loial 
lesi: Problem .Solutions A, Problem Solutions B, Problem Solutions C:, and 
Problem Solutions T. Another scale counted the number of problems lor 
which diagrams were drawn or for which existing diagrams were modified in a 
rational way: Diagrams A, Diagrams B, Diagrams C, and Diagrams T. A 
third scale told how many of these diagrams or modifications represented the 
information in the problem correctly: Correct Representations A, Correct 
Representations B, Correct Representations C, and Correct Representations 
T. The last scale was a ratio of Correct Representations compared to Dia- 
grams expressed as a percent: Percent A, Percent B, Percent C, and Percent T, 

'F'hcrc were some decisions to be made in the scoring of the diagrams. 
In certain probb'ins numbers had b(a-n used in a nominal sense to idenuly 
(•lemenis in a .^roup or in a linear order: these were judged to l)e pictorial 
representations. Incomplete or erased diagrams were counted. Even after these 
decisions were made, the scoring was somewhat subjective so agreement with 
other raters was desirable. Two other doctoral students in mathematics educa- 
tion scored a .sample of 25 tests. Intcrcoder reliability was computed separately 
for each of these scales: solutions, diagrams, and correct represcmations. For 
each of the first two scales 75 pairwise comparisons (3 coders X 25 tests) were 
made and ratios of agreement (number of agreements 4- 75) wea* computed 
for each problem. Then the agreement ratios were averaged over the 24 
[problems to give a reliability coefficient for each scale; these were .99 for the 
solutions scale and .97 for the diagrams scale. The procedure was the same for 
the correct representations scale except that the divisor for each problem var- 
ied according to the number of diagrams observed; the reliability coefiieient for 
ibis scale was .80. (^Sec Table I.) Wherever the number of disagreements 
exceeded 1.00, the tests and scoring sheets were examined to determine the 
source of disagreement. In some cases the original two coders had made mis- 
takes in following il»<' scoring instructions; in others it was a matter of mter- 
f)retaii()n which was ex[)C(ted given the nature of this scale. However, on 
Problem 6 the aulh(ir realized that she liad not recognized two types of correct 
re[)rcscniations so that problem was re.scored (jn the whole group of 173 tests. 



Data Analysis 

In designing thesf dy three types of hypotheses were stated: hypoihc-' 
ses about differences in performance, hypotheses about relationships, and hy- 
potheses about differences in relationships. Verifying them required a number 
of different statistical methods, and some hypotheses were investigated using 
more than one statistical technique. Although the sample was not randomly 
selected, it was considered a random sample of a hypothetical population with 
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Table 1 

Agreement over Coders 1, 2, and 3 on Correct Representations 



Problem Number of diagrams Average number of Average number of Average of agreement 





observed 


agreements 


disagreements 


ratios 


1 


19 


17.00 


2.00 


.89 


2 


21 


18.00 


2.00 


.86 


3 


0 








4 


8 


8-00 


0.00 


1.00 


5 


22 


21.33 


.67 


.97 


6 


24 


20.00 


4.00 


.83 


7 


10 


10.00 


0.00 


1.00 


8 


0 








9 


5 


4.00 


1.00 


.80 


10 


0 








11 


0 








12 


1 


.33 


.67 


.33 


13 


9 


7.00 


2.00 


.78 


14 


1 


.33 


.67 


.33 


15 


3 


3.00 


.00 


1.00 


16 


0 








17 


0 








18 


0 








19 


8 


4.67 


3.33 


.58 


20 


0 








21 


0 








22 


0 








23 


6 


4.67 


1.33 


.78 


24 


9 


9.00 


.00 


1.00 



Average of agreement ratios over 24 problems .80 
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chiirnacrisiics as dcs(:ril)C(l in ihc prcviovis section* ^'^"^ i^^stip |ic use <>* 
.statistics l)ascH on the assumption of random sarnp^^^^* "^^^^^^ ' 



Sex-related Differences in Performance on ih^ ^^^^ ^^^tj 

This section deals with the investigations concerning A ) ^j^-^.,.. 
ences in performance, specifically HI, H2, and H^^' '^^^iij ^'-'^'^^ gncment 
was possible using data available on the level of m^^^^'^aq^^al ^^j. 
student. Figure 2 indicates the sequence of ihe dai*^ '^^^^^Vsi^ cl^^^^| [his 
section. ^^eu^''^' 



Descriptive Statistics 
On All Scales 




^ Reliability 
Estimates 



8 



Sex-by-IQ 



ANOVAs 



■ t Tests On Means 



' Sex-by -Level 
ANOVAs 



Figure 2. Sequence of data analysis for Part A. 




' Recomputation of 
Sex-by-Level ANOVA, 
Problem Solutions C 



S;;;J„vingrest 



t,s of ihcsc 



1 and 2. Descriptive statistics and t tests on tr^^'^^^^' ^\\^ 
analyses arc reported in Table 2. Of all five spatial i^^^^* ""^V Or^^K^^^^^pljard 
was there a significant scx-rclatcd difference favo^i^S boy ^^^^ . ^ 
statistic ( Hays, 1 973 ) sex accounted for about 5 % oi i^c ""^tx^^^ li^i^^ 
.Significant .sex-related differences in favor of girls apP^*^'^^^ % ^^ 
I cm -solving subscales related to use of a diagram, v^^^^^ *^<-'Co ^ j 
to 6% of the variance. There were no significant scX'''^^*'^^^^ '^n^the 
Problem Solutions scales. ^^cn^*^^ 



ihat test, 
e prob- 
for 3% 



3. RvliabilUy estimates. The value of these r^s^^^^ is ^ 
reliability of the tests. Reliability was estimated for ^^^^1 ^^^d^^^."" 
Kuder-Richardson Formula 20 (Downie & Hcath> ^^"^^^ ^^n k^P''^"^ I o 
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Table 2 



Descriptive Statistics and Comparison of Means 
of Boys, Girls, and Both lor the Entire Test Battery 



Tests 



of 



1. Form Board 2^ 

2. Hidden Figures 2^ 

3. OAT Space Relations^ 

4. Problem Solutions A 

5. Problem Solutions B 

6. Problem Solutions C 

7. Problem Solutions T 

8. Diagrams A 
d. Diagrams B 

10. Diagrams C 

11. Diagrams T 

12. Correct Representations A 1 

13. Correct Representations B (I 

14. Correct Representations C I) 

15. Correct Representations T 24 



24 
16 
60 
B 
8 
8 

.24 

a 

V 

8 

24 



16. Percent A 

17. Percent B 

18. Percent C 

19. Percent T 

20. Card Rotations^ 

21. Cubes Comparisons'' 



toil 

1011 
100 
112 
42 



Ranges _ 



Means 



items Boys Girls Both Boys Girls Both Boys 



17 


14 


17 


6.36 


4.90 


5.66 


368 


292 


9 


10 


10 


2.42 


2.72 


2.56 


1.76 


238 


41 


45 


52 


26.67 


26.07 


26.39 


969 


548 


8 


7 


, 9 


1.67 


157 


167 


145 


141 


8 


9 


9 


2.09 


2.19 


2.14 


176 


193 


9 


9 


9 


3.34 


3.00 


3.18 


193 


202 


19 


20 


20 


7.10 


G.87 


6.99 


430 


'441 


8 


9 


9 


2.78 


3.51 


3.13 


149 


193 


5 


5 


5 


1.78 


2.16 


196 


103 


94 


2 


2 


2 


,04 


.12 


.08 


.21 


.32 


11 


13 


13 


4.60 


S.78 


5.17 


2.17 


2.56 


5 


7 


7 


.89 


1.22 


1.04 


1.03 


1.41 


4 


5 


5 


.78 


1.11 


.94 


.78 


.83 


2 


2 


2 


.03 


1.08 


.06 


.18 


.28 


6 


9 


9 


1.68 


2.37 


2.01 


1.45 


1.98 


101 


101 


100 


28.23 


31.26 ' 


29.70 


29.67 


30.32 


101 


101 


100 


44.06 


51.80 


47.88 


37.73 


33.85 


101 


101 


100 


60.00 


77.78 


71.43 


54.77 


40.10 


101 


101 ' 


100 


34.27 


38.77 


36.44 


25.98 


25.87 


104 


104 


109 


45.57 


46.98 


46.25 


21.44 


19.88 


40 


50 ' 


50 


8.67 


6.83 


7.80 


8.69 


8.28 



Standard deviations Mean Oil- 



1.42 
1.84 
1.98 
4.35 
1.75 




Girls Both 

fvalue 



3.3r* 

-.93 
.41 
.00 
.35 
1.13 
.35 

■2.76" 
•2.53* 
■1.92 
•3.25" 
■1.74 
•2.68" 
■1.38 
•2,59" 
-.66 
■1.42 
■2.35* 
•1.14 
-.44 
1.43 



.27 
2.43 
1.23 
.81 
.23 
1.76 
29.94 



iVote. Number ol girls taking all tests =83. 
^Number of boys taking these tests = 89. 



bi 



Number of boys taking these tests = 93. Number of boys taking the remaining tests = 90. 
V<.05, 
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!ii(I(Icn l-igurcs 2 (.59), DAT .Sp.uc: Rclauons (.88), and ihc lour 
prDblcni-soluiions scales of the icsi of maihcinaiical problem solving (.45, .04, 
.65, and .80 for scales A, B, C, and T, rcspeciivcly). A Pearson produci-iiio- 
meni correlation eocfficieni, corrected by liie Spearman-Brown formula 
(Downie & Heath, 1970), was used to calculate the reliability for the Cubes 
Comparison test (.69) which has two equivalent but separately timed sub- 
tests. There was no appropriate method of estimating the reliability of Card 
Rotations or the scales concerned with diagrams. 

4. Item analym of the prohlerri'^'olinnir test. This analysis, was used 
mainly to search for sex bias in individual items. Item difficulty was computed 
separately for boys and ^irls, as the proportion of correct answers on that item 
compaicu o the total number of responses. Sex bias was evaluated for each 
itcni by cr>mputing a / ratio for the dilFerence between the proportions. One 
item was found l)e significantly {/) < .05) easier for boys than for girls. 
Thi.s item asks which player has the best shooting record given a table of shots 
attempted and shots made; boys may have been more familiar with the task 
than girls. On ihc whole, however, the test of mathematical problems seemed 
free of sex bias. 

5. SeX'hy^level ASOVAs. A X ^ test showed that ihe actual distribu- 
tion of boys and girb- in the two levels of mathematics classes differed from the 
expected distributio.i ai the .05 level. (See Table 3.) Since Level 1 students 
might have had more opportunity than Level 2 students to learn the mathe- 
matics needed for the problem-solving test (especially in geometry), and since 
a disproportionate number of girls was found in Level 1, sex-related differ- 
ences on the problem-solutions scales or possibly even on the spatial tests 
might have been obscured. To check this, two-way, sex-by-level ANOVf\s 
were performed on all scores except the percent scales, using only those sub- 
jects with complete test data ( 170 of the original 176). First, a set of exact 
ANOVAs with equal cell sizes of 32 was computed; the cells were made equal 
by randomly eliminating subjects from the three cells with more than 32 
subjects. 

Table 3 

Distribution of Boys and Girls 
in Levels of Mathematics Classes 



Level 


Boys 


Girls 


1 


43 


51 


2 


50 


32 



Then, using the same random selection procedures, a second set of ANOVAs 
with equal cell sizes of 32 was computed. Since there were substantial differ- 
ences between the two sets of F values, a set of ANOVAs with unequal cell 
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si/cN w.is comiuilcd usinn \Uv &.\\:\ of .ill I /(> s\il>jcris. 'Tiililc -I pirscnls ilu- 
values generated by these thrc'c sets of ANC^VAs. Results whieh were signiti- 
cant on at least two of the ANOVAs for any test were eonsidered important. 
Others, were regarded as artifacts of the samples or due to limitations of this 
method of analysis. 

In general the results of the three ANOVAs were the same as those 
generated by the / tests: boys performed better on Form Board 2; girls did 
Ixnter on Diagrams A and T. The girls' advantage on Diagrams B disap- 
peared on all ANOVAs as did their superiority on Correct Representations B 
and T on all but the ANOVA using all the data. In addition, the ANOVAs 
turned up some interesting sex-by-level interactions on Diagrams T and Cor- 
rect Representations T. Figure 3, a graph of the cell n^eans on Diagram T for 
the ANOVA using all the data, shows that the sex-related difference on that 
scale was due lo superior performance by girls from the Level \ classes. What 
makes this graph noteworthy ij? that the graphs of the cell means for all the 
scales involving diagrams oj* correct representations for all three ANOVAs are 
similar to this one, although fcWof the interactions arc significant at the .05 
level. 

6 and 7. Recomputation of the t test and'sex-by4evel ANOVA: Prob- 
lem Solutions C. Another significant (/> <:.05) difference which appeared on 



7 _ 



6 - 



5 _ 



4 - 



G=6.84 




G=4.09 



B=4.02 



I 

Level 1 



I 

Level 2 



Figure 3. Sex-by-level interaciion; cell means on Diagrams T ANOVA using all data. 
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Equal cell sizes Equal cell sizes ' i Urequal cell sizes 
Tests (1st sample) (2nd sample) ( Using all data) 





Sex 


Level 


Sex by level 


Sex 


Level 


Sex by level 


Sex 


Level 


Sexbv 
level 


1. Form Board 2 


4.63* 


2.52 


<1 


7.49** 


<1 


6.62* 


9.51** 


<1 


3.01 


2. Hidden Figures 2 


<t 


2.12 


2.12 


<1 


5.08* 


4.70* ; 


<1 


5.46* 


2.80 


3. OAT Space Relations 


<1 


8.45** 


<1 


<1 


6.58' 


<1 


<1 


9.98** 


<1 • 


4. Card Rotations 


<1 


9.33** 


1.01 


<1 


4.00* 


<1 


<1 


5.78* 


<\ 


5. Cubes Comparison 


2.20 


10.48** 


<1 


1.10 


8.83" 


<1 


2.48 


10.68** 


<1 


6. Problem iiolutions A 


<1 


16.60** 


<1 


<1 


13.42** 


<1 


<1 


22.91** 


<1 


7. Problem Solutions B 


1.45 


39.35** 


<1 


<1 


49.56** 


3.56 


<1 


52.74** 


2.10 


8. Problem Solutions C 


8.77* • 


60.64** 


<1 


6.28* 


65.22 ' 


<1 


6.53* 


80.05** 


<1 


9. Problem Solutions! 


5.19* 


64.81** 


<1 


1.41 


66./9** 


<1 


2.78 


86.01** 


<1 


lu, uiagrams a 
11. Diagrams B 


D.I/ 








% IV* 

tt.fO 


314 


411' 


2888** 


3.33 


<i 


9.52** 


2.67 


2.05 


11.59*' 


1.57 


3.81 


13.20** 


3.50 


12. Diagrams C 


2.19 


4.93* 


<1 


1.09 


3.02 


1.09 


1.84 


5,33* 


1.30 


13. Diagrams T 


5.09* 


37.99** 


6.78* 


3.86 


28.30** 


3.86 


6.39* 


34.32** 


5.36* 


14. Correct Representations A 


<1 


18.55** 


3.40 


1.74 


16.88** 


5.77* 


1.25 


22.07** 


3.23 


15. Correct Representations B 


1.20 


10.80** 


<1 


2.59 


13.55" 


<1 


4.97* 


16.21** 


1.97 


16. Correct Representations C 


1.87 


1.87 


<1 


<1 


<1 


<1 


1.17 


2.34 


<1 


17. Correct Representations! 


1.89 


24.25** 


3.36 


3.51 


23.55** 


5.36* 


3.99* 


1.15 


4.42* 








11, 124) 






1.124) 






166) 



V<.05. 
**p<.10. 

1 , 

I 

O 
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all ihrec ANOVAs was in favor of boys on Problem Soluiions C. This differ- 
ence was also visible, although not significaiu, on ihc / lest. Since ihe one prob- 
• Icm showing significant sex bias was a C'-iyp,e problem, ihe author recomputed 
some of the statistics for the Problem Solutions C scale eliminating the biased 
problem. The t ratio was reduced to .65 and the /*\alue in the second sample 
was reduced to 2.58; both were no longer significant 

Relationships between Visual Spatial Abilities and Mathematical Prob- 
lem-solving Ability and Sex-related Differences In these Relationships 
(Part B) 

In this section the reIaiion.ships between the spatial and mathematical 
variables are examined and the question of whether or not these relationships 
are the same for both sexes is discussed. Specifically, hypotheses H3a, H3b, 
H4, H5b, and H5c are investigated. In order to look at the relationships from 
several points of view a number of statistical analyses were performed. Figure 
4 indicates the sequence of these analyses. 



Correlation Coefficients 



o 

^ Significance of 
■ Differences in 
Correlation 
Coefficients 



Factor Analysis 



Regr. ^sion 
Analysis 



^ Canonical 
Correlation 
Analysis 



Fitjure 4. Sequence of data analysis fot Part B. 



I. Cf*rrt'lntum coeficienU. Correlations matrices for boys, girls, and 
the whole* group were computed using the spatial scores, the problem-solving 
subscales, and IQ stanines of the 170 subjects who had taken all of the tests. 
For girls and for the whole group the relationships between all the spatial tests 
and all the problem-solving solutions scales were significant and positive. For 
boys only DAT Space Relations (Bennett et al., 1972) was significantly rc- 
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Iau!d U) all ihe solutions scales. In addition for boys, Cubes Comparison and 
C:ard Roiaiions (French ei al., 1069a, 1969b) were signiGcanily related lo 
Problem Sobaii^ns C, and T, bui ihrrc wcrr no other signiGcant relation- 
ships between the spatial variables and solution scales. For girls and for the 
whole group the spatial scales, except Form Board 2 (French et al., 1969a, 
1969b) were significantly related to Correct Representations A, B, and T. I'or 
boys only DAT Space Relations and Card Rotations were significantly i dated 
to correctly representing problems with diagrams. The relationships between 
correctly representing a problem with a diagram and solving that problem arc 
summarized in Table 5. As expected, the size of the correlation coefficients was 



directly related to the type of problem. 






Table 5 




Correlation Coefficients Selected to Show the Relative 
Importance of Using Diagrams in Solving the Three Categories of 

Problems 




Girls Boys 


Both 


Diagrams A 

Correct Representations A 


Prod/em Solutions A 
.362 .o05 
.702 .662 


.326 
.669 


Diagrams B 

Correct Representations B 


Problem Solutions B 
.359 .109 
.538 .235 


.234 
.399 


Diagrams C 

Correct Representations C 


Problem Solutions C 
.055 -.011 
.108 -.066 


.107 
.028 



2. Siitnijicance of diferences in correlations coefivients. It appeared 
that the correlation coefficients were generally larger for girls than for boys, 
especially between the spatial and mathematical variables. To investigate the 
signilicance oi these differences, a Fisher r to Z transformation (Downie & 
Heath. 1970) was performed on each coefficient and the differences tested for 
signiGcance using a / ratio. Of the total of 231 pairs tested one could expect 
that, by chance, 12 would be significantly different at the .05 level. While the 
actual results ( 17 at the .05 level) were not much above the chance expecta- 
li(m, the pattern of results is interesting. All but three of the 17 correlauons on 
which boys and girls differed involved drawing diagrams for C-type problems 
which supjjosedly have no spatial content. For the boys this was related to 
Form Board 2; for ^irls it was related to other diagram drawing scales. The 
results of this analysis turned up very few differences, but the relationships 
may coniaiii oibcr scx-rclated differences that were loo slight to detect. 

Ihv irlaiivc importance of Vx and SR-O tests or two- and three- 
dimensional tests in predicting mathematical problem-solving ability were 
also Investigated using this method. For the whole group three-dimensional 
tests of both types were somewhat better predictors of the total problem-solu- 
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tions scale than two-dimensional tests, but the difference was significant only 
for the Vz tests. (See Table 6.) The same pattern was observed with all the 
solutions subsrales, although the /^-values of t^le differences were not as small 
as .05. 



Table 6 

Correlations Between the Spatial Variables 
and Problem Solutions Total for the Whole Group 





SR-O 


Vz 


/ratio^ (SR-O-Vz) 


2-dimensional 


.350 


.234 


.1.41 


d-dtmensional 


.395 


.446 


.76 


r ratio (3-D-2-D) 


.57 


2.72* 





°/ ratio for the significance of the difference between correlation coefficients, 
correlated data (Downie & Heath, 1970). 



•p <r.oi. 



3. Factor Analysis. To reduce the data a series of factor analyses was 
performed. The method used in the CAA Project (Harris & Harris, 1973) 
and by Meyer (1976) suggested the pxjssibilitj^ of using several different types 
of factor analysis, each with orthogonal and oblique rotations, and then Onding 
factors common among them. Principal Components Analysis (P Comp A) 
and Principal Factor Analysis (P Fact A) were the iwo methods chosen 
(Evanson, 1975); more factors were extracted by the latter method than the 
former. Four factors emerged consistently in all four analyses: a Solutions fac- 
tor, a Space factor, and two Drawing factors, one for A- and B-type problems 
and the other for C-type problems. (See Tables 7-10.) The tables list vari- 
ables under a factor if their loading on that factor exceeded .30 for any analysis 
for either sex. 

Several things should be noticed about the Solutions factor. One in- 
volves the loadings of Diagrams A and Correct Representations A. Indeed, in 
P Fact A the Solutions factor split into two subfactors, one for A-type 
problems and one for the other types. This supports the idea that solving A- 
type problems is dejxndent on using diagrams effectively. A .second point is 
that girls' loading of Hidden Figures 2 is higher than tre boys'. This could 
indicate a .sex-related difference in problem-solving styles with Hidden 
Figures items. Perhaps more boys used visual methods and more girls used 
logical methods. In P Fact A, Hidden Figures 2 had its own factor- The girls' 
loadings on Problem Solutions B were substantially less than those of the boys. 

A single factor emerged in the P Comp A with all the spatial tests 
except Hidden Figures 2 loading heavily on it. In the P Fact A a subfanor split 
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Table? 
The Solutions Factor 



Principal components analysis Principal factor analysis 



Variables Orthogonal Oblique Orthogonal Oblique 

Girls Boys Girls Boys Girls Boys Girls Boys 



1. Problem Solutions A 


728 


86 


70 


84 


57 


43 


37 


68 


49 


41 


29 


62 


2. Problem Solutions B 


39 


71 


37 


65 


38 




63 


35 


37 




54 




3. Problem Solutions C 


60 


62 


60 


54 


59 




72 




62 




72' 




4. Hidden Figures 2 


72 


26 


78 


32 


















5. Diagrams A 


40 


50 


37 


26 




67 




53 




17 




38 


6. Correct Representations A 


72 


78 


65 


65 


27 


75 


11 


82 




42 




81 



^Decimal points omitted. 
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Tables 
The Space Factor 



Principal components analysis Principal factor analysis 



Orthogonal Oblique Orthogonal Oblique 





Girls 


Boys 


Girls 


Boys 


Girls 


Boys 




Girls 




Boys 


UubesConfiparison 


75^ 


77 


75 


76 


61 


61 


42 


13 


61 


08 
21 


2. DAT Space Relations 


69 


74 


67 


71 


56 


69 


35 


13 


51 


3. Form Board 2 


71 


59 


71 


56 


58 


52 


13 


45 


19 


36 


4. Card Rotations 


64 


71 


60 


70 


55 


5* 


03 


33 


32 


18 


5. Problem Solutions 6 


52 


41 


41 


31 


48 


35 


19 


21 


20 


20 


6. Problem Solutions A 


38 


06 






29 


14 










7. Problem.SolutionsC 


34 


28 






27 


17 










8. Correct Representation A 


30 


01 






22 


13 











^Decimal points omitted, 
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Table 9 



The First Drawing Factor (Problem Types A and B) 



Principal components analysis Principal factor analysis 



Orthogonal Oblique Orthogonal Oblique 
Variables 





.Girls 


Boys 


Girls 


Boys 


Girls 


Boys 


Girls 


Boys 


1. Diagrams A 


5? 


65 


46 


71 


37 


53 


63 


13 


48 


2. Diagrams B 


86 


85 


87 


87 


65 


76 


24 


52 


72 


3. Correct Representations A 


34 


29 


19 


41 


20 


22 


39 


04 


19 


4. Correct Representations B 


81 


87 


79 


88 


69 


77 


11 


60 


80 


5. Problem Solutions B 


46 


06 






46 


10 









taal points omitted. 
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Table 10 

The Second Drawing Factor (Problem Type C) 



Principal components analysis Principal factor analysis 



^'"^ Orthojonal Oblique Ortliogonal Oblique 



Girls Boys Girts Boys Girts Boys Girts Boys 



91^ 94 92 95 87 „ » 

CoirertlfepresenlafBnsC 93 91 94 92 86 90 SS 85 

-16 36 . 18 36 . 09 25 - 08 3 

Hidden Rgures 2 - 09 31 . 02 
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along dimensional lines was indicated. Problem .Solutions B had a substantial 
loading on the spatial factor for girls and a more modest loading for boys. I'he 
First Drawing factor was for A- and B-type problems. In the P Faci A with 
oblique rotation, the girls' factor split neatly into two subfactors, one for each 
type. The appearance of Problem Solutions B on this factor for girls suggests 
that B-type problems were more closely linked to spatial skills for girls than 
for boys in this sample. The Second Drawing factor was for Cl-lype problems, 
which appeared to have no spatial content. Form Board 2 had modest loadings 
on this factor for boys. 

4. RejjressiojL Analysis. To investigate the relative importance of each 
of the spatial tests in predicting scores on the problem -solving scales, two re- 
gression analy.se.s were carried out; in each analysis the data for boys and for 
girls were treated separately. The first regression analysis used all the spatial 
tests including Hidden Figures 2; in the second analysis this lest was omitted 
because it had not appeared in the Space factor (see Table 8). 

Tai)les II through 14 list the standardized regression coefficientjs and 
coefficients of determination for each of the problem-solving regression analy- 
.ses. In each case .the regression equation predicted more of the variance for 
girls ilian for boys. The spatial variables were more important predictors of 
sfilving B-iyi)e prol)!eins than A- or C!-tyi)e [jroblcins. For boys DAT Space 
Relaii(ms was generally the only important predictor for all except the C-type 
Ijrol)lems. For girls all except Form Board 2 and perhaps Card Rotations were 
important. With the exception of Correct Representations A and T, none of 
the other regression equations accounted for as much as 20 7o of the variance, 
so the tables containing those regression coefficients are not included. Overall, 
D.Vr Space Relations was the best predictor of the scales for drawing dia- 
grams. On the scales for correct representation and percent the girls' equations 
usually had several significant coefficients. The boys' usually had only one: for 
Correct Representations A and T it was DAT Space Relations, for Correct 
Representations B it was Card Rotations, and for Correct Representations C 
ii was Form Board 2. 

Preliminary comparison of correlation coefficients indicated that the 
ihree-din-ensional tests were better predictors of the problem-solutions scales 
than the two dimensional tests. Regression analysis was used to investigate this 
hypothesis further. Tables 15 and 16 give the coefficients of determination for 
predicting the problem-solutions scales from the two- and three-dimensional 
tests separately. (*om|)arison shows that the three-dimensional tests accounted 
lor more vaj'iantr on each .scale than the two-dimensional tests. The difl'er- 
cnces. which ranged from 37o to 15%, were larger for girls than for boys. 
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Table 11 

Standardized Regression Coefficients 
Problem Solutions A 



Regression 1 Regression 2 
Girls Boys Girls Boys 



Hidden Figures 2 


.18* 


.11 






DAT Space Relations 


.19 


.23* 


.20* 


.23* 


Cubes Comparison 


.23* 


-.04 


.24 


-.04 


Card Rotations 


.16 


.06 


.16 


.08 


Form Board 2 


-.02 


.07 


.00 


.07 


Coefficient of determination 


.26 


.10 


.23 


.09 



•p <: .10. 
••p <: .05. 



Table 12 



Standardized Regression Coefficients 
Problem Solutions B 



Variables 


Regression 1 


Regression 2 


Girls 


Boys 


Girls 


Boys 


Hidden Figures 2 


.18** 


.09 






DAT Space Relations 


.22* • 


.32*** 


.23* • 


32*.. 


Cubes Comparison 


.28 


.10 


.29* 


.11 


Card Rotations 


.17 


.17 


.20* 


.18* 


Form Board 2 


.04 


-.02 


.06 


-.03 


Coefficient of Determination 


.38 


.24 


.35 


.23 



•p<:.io. 
••p<:.05. 
•••p<:.oi. 
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Table 13 



Standardized Re ^jression Coefficbnts 
Problem Solutions C 

Regression 1 Regression 2 



Girls Boys Girls Boys 



Hidden Figures 2 


.21** 


.03 


.36* 


.05 


OAT Space Relations 




.05 


Cubes Comparison 


.05 


.26 


.05 


.26- • 


Card Rotations 


.02 


.15 


.05 


.15 


Form Board 2 


.05 


-.07 


.07 


-.07 


Coefficient of determination 


.24 


.13 


.20 


.13 



•p<.10. 
••p<C.05. 

•••p<:.oi. 



Table 14 

Standardized Regression Coefficients 
Problem Solutions T 

Regression 1 Regression 2 
Variables g^yg Qj^,g . ggyg 



Hidden Figures 2 


.23 


.09 




.23* 


DAT Space Relations 


32... 


.23* 


.33* 


Cubes Comparison 


.22* 


.15 


.22* • 


.15 


Card Rotatiorvs 


.14 


.16 


.17 


.17 


Form Board 2 


.03 


-.02 


.06 


-.02 


Coefficient of determination 


.41 


.19 


.36 


.18 



•p<:.io. 
^ *p<:.05. 
-••p<:.oi. 
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Table 15 



Standardized Regression Coefficients 
Two- and Three-dimensional Comparison 
Problem Solutions A and B 

Problem Solutions A Problem Solutions B 

Variables 

Girls Boys Both Girls Boys Both 



Two-dimensional 



Form Board 2 


.12 


.13 


.12 


.21 


.08 


.11 


Card Rotations 


.28 


.15 


.22 


.32 


.33 


.33 


Coefficient of 














determination 


.12 


.05 


.07 


.19 


.13 


.15 


Three-dimensional 














DAT Space Relations 


.24 


.28 


.27 


.29 


.37 


' .33 


Cubes Comparison 


.28 


-.01 


.19 


.34 


.14 


.24 


Coefficient of 














determination 


.20 


.08 


.14 


.30 


.20 


.25 



Table 16 

Standardized Regression Coefficients 
Two- and Three-dimensional Comparison 
Problem Solutions C and T 

Problem Solutions C Problem Solutions T 

Variables 

Girts Boys Both Girls Boys Both 



Two-dimensional 



Form Board 2 . 


.15 


-.02 


.09 


.21 


.07 


.10 


Card Rotations 


.17 


.25 


.21 


.30 


.30 


.32 


Coefficient of 














determination 


.07 


.06 


.06 


.18 


.11 


.16 


Three-dimensional 














DAT Space Relations 


.39 


.08 


.23 


.38 


.28 


.33 


Cubes Comparison 


.08 


.28 


.19 


.27 


.18 


.23 


Coefficient of 














determination 


.19 


.13 


.11 


.33 


.16 


.24 



5. Canonical Correlation Analysis. To obtain a clearer picture of the 
relationship between the composite for each type of" problem-solving stale and 
the spatial variables, a canonical correlation analysis was run on the boys' and 
girls' data separately. The spaiial ici>is. except for Hidden Figures 2, were 
used as the independent variables; the ihree suhscores for each type of problem 
scale were u.sed as the dependent variables in the four different analyses. The 
only canonical correlation which accounted for more than five percent of the 
variance in the dependent variable .was the first one, the problem solutions 
composite. (.See Tal)le 17.) This canonical correlation summarizes a number 
of results of the regression analysis. The canonical correlation acmunied for a 
larger part of the girls' correlation than the boys\ and scores for both A- and 
B-iype problems were related to thai * of the variance predicted by the 
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spaiial variables. In ihe boys' correlalion only the scores for B~iype problems 
were iinijoriani. In ihe girls' correlalion three spatial tests shared the predic- 
tion; for boys, DAT Space Relations was most important. 

Table 17 



Canonical Correlation Analysis 
Spatial Composite with Problem Solutions Composite 





Standardized 


coefficients 


Variables 


Girls 


Boys 


Dependent 




.06 


Problem Solutions A 


.39 


Problem Solutions B 


.72 


.99 


Problem Solutions C 


.05 


-.03 


Independent 




-.04 


Form Board 2 


.08 


DAT Space Relations 


.41 


.67 


Card Rotations 


.34 


.38 


Cubes Comparison 


.48 , 


.20 


Amount of the variance in the dependent 






variables accounted for by this canonical 






correlation 


13.31% 


7.78% 



Conci'asions 

The following is a summary and interpretation of th?? results of this 
study in the context of previous and concurrent research. 1 he onclusions are 
organized in terms of the hypotheses stated in the design section. 

HI. Boys and girls not differ in their ability to solve niaifwmatical 
l)rohkms. 

The bulk of the literature reviewed indicated that in seventh and 
eighth grades, boys are better than girls at solving problems, e.specially e^^ro- 
metric problems or practical problems with spatial content. This seemed c.s^.e- 
cially true at the upper ability levels. In this sample, none of the differences in 
the problem-solutions scales evaluated by the / test were significant at the .05 
level. Significantly better performance by boys on C-type problems was found 
in the sex-by-level AxNOVAs. The possible better performance by boys on spa- 
tially oriented problems suggested by the background review does not seem 
relevant as these were problems with no apparent spatial content. The F- 
values for the sox-by-level interaction were less than 1.0, and examination of 
cell means showed that these differences were found at both ability levels, dis- 
crediting any hypothesis of superior male performance at higher ability levels. 
The possil)le explanation is that the difference is due to sex bias in problem 
content. When the only problem on which the difficulty was significantly dif- 
ferent for boys and girls (a C-type problem about sports) was removed, the 
sex-related difference in favor of boys on C-type problems was no longer sig- 



niOcani, lending support lo ihis explanaiion. When Meyer (1976), in ihe 
siudy reported in Chapter 9, analyzed her data by sex she found no significant 
sex-related diflferenccs on any of the Rom berg- Wearne problem-solving scales. 
A middle school study by Fennema and Sherman (1976) also used the Rom- 
berg- Wearne problem-solving scales; there was a sex-related difference in only 
one of the four geographical areas of the city used in the study. In summary, ii 
appears that sex-related differences in problem solving in mathematics arc 
disappearing. 

H2. There are no sex-related differences in performance on measures 
of any of the visual spatial abilities: two- or three-dimensional SR-0 or Vz or 
visual disem bedding. 

Some of the studies reviewed earlier showed sex-related differences oc- 
curring in or before grade seven. Meyer's (1976) separate analysis of her 
cognitive abilities data indicated thai boys performed better than girls 
{p <: .01 ) on Spatial Relations, a form-board type test. Fennema and Sher- 
man ( 1976) found no significant sex differences on DAT Space Relations in 
their middle school study, and in an earlier study using the same test Sherman 
and Fennema (1977) found significant differences in only two of four high 
schools participating. The results of this study fit with those cited above. Sex- 
related differences on Form Board 2 in favor of boys were significant at the .01 
level, but there were no differences on DAT Space Relations, Hidden Figures 
2, Card Rotations, and Cubes Comparison. As on tests of mathematical 
problems, it appears that differences in spatial ability are not as widespread as 
earlier studies indicated. However, further research on form-board tests 
would be useful. 

H3a. There are significant positive relationships between each of the 
visual spatial abilities and mathematical problem-solving ability. H3b. The 
relationships are stronger for Vz tests than SR-0 tests and stronger for three- 
dimensional tests than for two-dimensional tests. H3c. These relationships do 
not differ by sex. 

These hypotheses were investigated in a number of ways in this study. 
The correlation coefficients between all the spatial variables and the problem- 
solutions scales were significant and positive for girls; for boys many but not all 
of the correlation coefficients were significant. The problem-solutions scales 
had low but significant loadings on the spatial factor for girls but not for boys 
in the factor analysis. In the regression analysis the spatial variables predicted 
more of the variance on the solutions scales for girls than for boys. Hidden 
Figures 2 had high loadings on the Solutions factor for girls but not for boys 
and contributed more to the regression equations for girls than for boys. The 
clo.ser relationship between mathematical problem solving and spatial abilities 
for girls than for boys was also suggested by Meyer*s (1976) separate factor 
analysis of her data. She found that Spatial Relations was relevant to only one 
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factor for boys but lo two faciors, one of which seemed lo involve maihcmaiical 
problem *;olving, for girls. 

It is the author's huneh that these sex-related differences in the rela- 
tionship between solving mathematical and spatial problems are a result of 
differences in method or style of solving spatial items. Both visual and logical 
methods of solving spatial items were reported in the background section, as 
was the close connection between spatial tests and tests of figural reasoning. 
The close connection between figural or abstract reasoning and mathematical 
problem solving has been similarily noted. If girls as a group, more often than 
boys, solve spatial items logically, then the relationships between spatial and 
mathematical tests should be closer for girls' than for boys. Since problem- 
solving style was not studied directly here, this remains a hunch. 

A hypothesis generated by the results of the NLSMA (Wilson & 
Be^le. 1972a, 1972b) and the Project Talent study (Flanagan et al., 1964) is 
that Vz tests are l^etter predictors of mathematical problem solving than SR-O 
tests. Indeed DAT Space Relations, a Vz test, often had the largest coefficient 
in the regression equation for the problem-solutions scales, especially for boys. 
Cubes Comparison was the next best predictor, especially for girls. Given the 
fact that several different methods have been imported for solving C-ibcs items, 
it might be that Cubes Comparison was a Vz test for this sample. On the other 
hand. Form Board 2, also a Vz test, was the least valuable predictor. It seems 
reasonable that the three-dimensional characteristic was the important factor. 
This was supoorted by the results of the regression equation. The three-di- 
mensional character of Cubes Comparison and DAT Space Relations may 
have forced this .seventh-grade group to use logical methods more often than 
they did on the two-dimensional tests. This too remains a huneh about prob- 
lem-solving slyle. 

T\h\ suggestion was made in the introduction and leinforeed in the 
baekgnmnd section that poorer performance by females on tests of mathemati- 
cal problem solving might be due to deficiencies in spatial skills. The results of 
this study argue against that suggestion. The one spatial test on which there 
was a significant difference in favor of boys. Form Board 2, was the least im- 
}X)rtant in the regression equations for both sexes. 

H4. Knch type iff visual spatial ability is more closely related to wiving 
problems with luirh spatial content than those with little spatial content. 

The curious resuli of this part of the investigation was that the solution 
i4 B-iype problems was more closely related to the spatial variables than the 
solution scale for A-type problems, especially for girls. This appeared in the 
factor analysis where Problems Sc lotions B loaded substantially on the Space 
fa(tor and in the regression analysis where the spatial variables predicted 
more of the variance on Problem Solutions B than on Problem Solutions A and 
C. One possible explanation for this involves the stimuli for the A- or B-type 
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problems. Five of ihe eight A-iype problems have pictorial stimuli; by defini- 
tion none of the B-type problems do. It may be that in solving R-type problems 
subjects used spatial skills more often to visualize thr situation in the problem 
or to organize the information given than they did in A-type problems where 
the visual organization was already presented. However, this is purely 
speculative. 

H5a. Boys and girb do not differ in tfieir use of diagrams in st tiring 
mathematical problems. H5b. Use of a diagram in soloing a mathematical 
problem is positively related both to the ability to solve that problem and to 
visual spatial abilities. H5c. There are no sex-related differences in these 
relationships. 

An unexpected result of this study was that Level 1 girls scored higher 
on the diagrams and correct representations scales than Level 1 boys, while at 
Level 2 there was no sex-related difference. All the Level 1 classes had studied 
some geometry but only one of the four Level 2 classes had geometric instruc- 
tion. It may be that girls, who are supposedly more successful at school, were 
applying what they had learned more often and more successfully. The fact 
that this difference was significant for the A-type problems but only a trend for 
the others supports this explanation. 

Both the comparison of correlation coefl5cients and the factor analysis 
showed that the relationship between drawing a diagram for a problem, espe- 
cially a diagram that represents the information correctly, and solving that 
problem was closer for A-type problems than other types. This was expected 
because of the way the problems were categorized. What was unexpected was 
that the spatial variables were better predictors of the solution scales than of 
the three types of drawing scales except for Correct Representation A. This 
seems again to indicate that the spatial tests involved a substantial reasoning 
component for this seventh-grade sample. 

Certainly all the questions raised in this study have not received their 
final answers. Replication with other groups is always desirable. More infor- 
mation on the relationship between visual spatial abilities and mathematical 
problem solving might have been obtained if some of the spatial tests had not 
been so difficult for these students. The author noticed while scoring the prob- 
lem-solving test that some subjects appeared to misunderstand the geometric 
concepts in the A-type problems although the teachers had affirmed their 
classes' familiarity with these concepts before testing. A comprehension-appli- 
cation measure like the Romberg- Wearne test (Wearne, 1976) would have 
been valuable. Although information was gathered on the presence and 
strength of relationships between problem solving in mathematics and visual 
spatial abilities^ the cognitive processes used in solving these items remain un- 
determined. Ultimately these solution processes may be researched through 
subjects' own accounts of their methods of solving spatial and mathematical 
problems in the thinking aloud procedures described elsewhere in this volume. 
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Despite ihc limiiaiions n this sluciy. there are impMcalions for edur a- 
licmal practitf. Recent studies have shown few, if any, scx-rclalcd difl'crcnees 
in mathematical pr()l)ler.-solving ability, cautioning educators against the 
myth of male superiority at mathematical reasoning. Such obsolete sex-biased 
views mig' be partly responsible for the small number of women electing 
advanced aihemalics courses in high school or college. The results of this 
study also .^now the imporia' ' of coii.siructing tests free of sex bias. Wlule the 
lest used in this study was designed i(» l)e neutral with respect to male and 
female actors, the results of the item analy.sis indicate thai problem content 
should also be considered. Although sex-related differences in spatial skills did 
not seem lo yield sex differences in ability to .solve mathematical problems, 
iliese two abilities were related. Spatial training in mathematics classes and 
more specific instruction in drawing and using diagrams should be en- 
couragc'd, esj)ecially in light of the frequency of diagrivmming found in this 
study and" in others such as lho.se by Meyer and ZaI :w.ski reported in this 
volume. 

In ccmelusion. it a[)[)ears that the inferior al)i lilies attributed to feniales 
i/i spatial and mathematical areas should be reevaluated, just as the inferiority 
of women has been challenged in legal, social, and economic matters. 'This is 
not to say tlic equality has been achieved. However, the changes thai have 
occurred suggest that further change is possil)lc. 



Chapter 11 



Relationships Between Selected 
Noncognitive Factors and the 
Problem-solving Performance of 
Fourth-grade Children 

Donald R. Whitaker 

Among the variables presumed related to success in problem solving 
arc attitudes, values, interests, appreciations, adjustments, temperament, and 
personality. Such variables have been termed noncognitive to contrast them 
with the cognitive variables of intelligence, aptitude, achievement, and perfor- 
mance. This study investigated the relationships betv/cen selected noncogni- 
tive factors and the mathematical problem-sulving performance of fourth- 
grade children. 



The Nature of the Problem 

Clues about the noncognitive factors that inQuence problem-solving 
performance may be found by examining factors thought to inQuence overall 
mathematics achievement. Students' attitudes toward a school subject are ^ 
thought to affect their achievement in that subject. Likewise, educators believe 
that teachers' attitudes toward a subject can inQuence their students' attitudes 
and achievement in that subject. Research findings, while sometimes Inconsist- 
ent and inconclusive, usually show low, positive correlallons between student 
and teacher attitudes toward mathematics and student achievement in mathe- 
matics (see Phillips, 1973; Torrance, 1966; Wess, 1970). These findmgs raise 
the question of cause and effect. Do teachers' altitudes cause student attitudes, 
or is the effect perhaps in the other direction? 

Since an individual's overall mathematics achievement is a composite 
of achievement in several areas, attitude toward mathematics may also be a 
composite of attitudes toward aspects of mathematics such as computation and 
problem solving. Researchers, however, have tended to use single, global mea- 
sures of attitude rather than investigating attitude toward only one phase of 
mathematics (see, for example, Dutton, 1962; Phillips, 1973; Reys & Delon, 
1968). The study reported here examined the relationships between both stu- 
dent and teacher problem-solving attitudes and student performance in mathe- 
matical problem solving. 

• Though research findings vary, there is evidence of sex-related differ- 
ences in mathematics ( for example, see chapters Hy Meyer and Schonberger in 
this monograph). Th---- findings suggested includn^g: sex as a variable in the 
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present study. Founh-grade students and teachers were selected as subjects Ibr 
tlir study, since some research suggests that attitudes toward mathematics are 
Formed during the intermediate grades (C.laliahan, 1971; Fedon, 1958; Strighl, 
1960). 

The Analysis ol Mathematics Inirtruetion Project at the University of 
Wisconsin Research and Development Center for Cognitive Learning has de- 
veloped an elementary mathematics program called Drreln/uni; Mathematiail 
Processes (DMP) (Romberg, Harvey, Mo.ser, & Montgomery, 1974, 1975, 
1976). The DMP program is a research-based, activity-oriented approach to 
teaching and learning mathematics in grades K-6. One of the basic goals of 
DMP is the development of mathematical problem-solving skills and 
processes. While a DMP staff member, the author worked with a number of 
teachers and students in DMP schools and was impressed by the manner in 
which students attack problems and by the positive attitude both students and 
teachers seemed to have toward the DMP program (Montgomery & Whita- 
ker, 1 975 ). Therefore, the sample for this study involved students and teachers 
who had particlpaied in the large-scale field test of DMP. For comparison, a 
non-DNf P sample of students and teachers was included. 

Key Terminology Used in the Study 

For this study a f>whlern is a situation which presents an objective that 
an individual is motivated to achieve, although no immediate procedures are 
available to arrive at that objective (Zaiew.ski, 1974, p. 2). The situation in 
each problem is mathematical in nature. Prnhlem solving is the process of 
analyzing the situation posed in a problem, producing a solution procedure, 
using that procedure, and achieving a solution to the problem. Malhsmatkai 
lirohlnn-snlinnii fierformririce is represented by a score on a mathematical 
problem-solving test. 

.'\s used in this study, attitude is the predisposition of an individual to 
evaluate some symbol, object, or aspect of his or her world in a favorable or 
unfavorable manner (Kalz, 1967). In particular, attitude toward probicm 
solving is the predisposition of an individual to evaluate factors related to 
mathematical probltrm-sulving in a relatively favorable or unfavorable manner 
and is represented by a score on an altitude scale. 

The Questions of the Study and Their Significance 

The first two questions this investigation was designed to answer per- 
tained l(» the attitudes of the subjects of the .study: 

Question I : Do fourth-grade students have favorable altitudes toward 
problem solving? (Do differences in attitude exist if stu- 
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dents are classified by sex or program type: DMP versus 
non-DMP?) 

Question 2: Do fourih-grade teachers have favorable attitudes toward 
problem solving? (Do differences in attitude exist if 
teachers are classified by type of program taught: DMP 
versus non-DMP?) 

Educators generally desire that students and teachers hold favorable 
attitudes toward all phases of the school program, so the findings of the study 
help to determine if this is the case. Directional relationships between prob- 
lem-solving attitudes and problem-solving performance were analyzed in the 
second pan of the study. 

The problem-solving performance of the participating students was of 
major importance for several questions of the study. Question 3 deals with that 
issue: 

Q'jec»ijn 3: How do fourth-grade students perform on a test of prob- 
lem-solving performance which provides measures of 
comprehension, application, and problem solving? (Do 
differen<;res in problem-solving performance exist when 
students are classified by sex or by program type: DMP 
versus non-DMP?). 

Most tests of problem-solving performance have provided a single 
score measuring each student's ability to solve problems, but such scores are 
inader-'ate to explain why some students are successful at solving a set of 
probi MS an J others are not. The Romberg- Wear nc Problem -solving Test 
( VVearnc, 1976; as>*d in th»s study was designed to overconie this inadc juacy. 

Assessing attitudes U)warcl problem solving is justified if there is reason 
to suspect that these attitudes are related to performance. The fourth and fifth 
questions of the study pertain to thai relationship. 

Question 4: What is the relationship between fourth -grade students' 
.attitudes toward problem solving and their performance 
in problem solving? (Do differences in this relationship 
exist if students are classified by .sex or by program type: 
DMP versus non-DMP?) 

Question 5: What is the relationship between fourth-grade teachers' 
attitudes toward problem solving and their students' per- 
formance in problem solving? (Do differences in this re- 
lationship exist if students are classified by sex or by pro- 
gram type: DMP versus non-DMP?) 

Past studios have not examined the relationship between attitude and 
performance in a problem solving or any other single phase of the mathematics 



curriculimj.ir problem-solving alliludes and peiTorniana! are highly rdaird, 
I hen rc'scanh into other specific phases of the curriculum is mandated, 

Kdiuauirs generally believe teacher aililude and eireciiveness in a par- 
lirular subject to be important determinants of student attitudes and perfor- 
mance in that subject (Aiken, 1969). However, research findings pertaining to 
this belief have not been definitive. The last two questions of the study were 
directed at this cause-efTect relationship. 

Question 6: Do fourth-grade teachers' attitudes toward problem solv- 
ing alFect their students' problem-solving performance or 
is the effect of the opposite nature? (Do differences exist 
when students are classified by sex?) 

Question 7; Do fourth-grade teachers' attitudes toward problem solv- 
ing allcct their students' attitudes toward problem solving 
or is the eficct of the ()i)posite nature? (Do (lifr(M'eii(es 
(^xist when students are classilied by sex?) 

It is reasonable to suspect that students' attitudes and performance 
might affect teachers' attitudes, instead of the relationship being only in the 
other direction. It is important, then, to gain information on which source — 
the teacher or the student — has ihe greater effect on the ciher's attitude and 
performance. 

Related Literature 

A review of the recent related problem-solving literature is given in 
('hapier 2 of this monograph. This section of the present chapter includes an 
(iverview of recent attitudinal research and summarizes studies having partic- 
ular significance for this investigation. 

The Nature of Attitudes 

Most definitions (see Allport. 1967) indicate that attitude is a learned 
slate of readiness, a predisfM)siti()n to react in a particular way toward certain 
stimuli. Important to any study is the idea that attitude involves boih (ognitlvc 
and none ogni live coinfX)nents — that is. both beliefs and feelings — and, to 
some extent, a behavioral component. For example, a student's attitude 
tov/ard mathematics is a composite of intellectual appreciation coupled with 
emotional and behavioral reactions to the subject. 

In a condensation of theoretical formulations about attitudes, .Scott 
(1968) suggests that the concept has as many as II variable properties. Of 
parucular importance to this assessment of attitudes toward mathematics are 
the properties of direction (Does the individual generally like or dislike math- 
ematics?) and intensity (How strongly does the individual feel about this 
attitude?). 
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The Measurement of Attitudes 

A number of techniques an: available lo assess auiiudes. Corcoran and 
Gibb ( 1 96 1 ) describe several of the measures of altitudes toward mathemat- 
ics, including questicmnaires, attitude scales, incomplete sentences, projective 
pictures, essays, observational methods, and interviews. Of these techniques, 
perhaps the most widely used are the attitude scales. The most popular types 
of scales are the Thurstone scale (Thurstone, 1928), the semantic differential 
(Osgood, Suci, & Tannenbaum, 1957), and the Liken scale (Likert, 1932), 
ihe lypc used in the present study. Other but less popular measures include 
biographical and essay studies (Campbell, 1950) and the monitoring of gal- 
vanic skin responses of subjects (Cooper & Pollock, 1959). Still other re- 
searchers argue for a multiple-indicator approach (Cook & Selltiz, 1964), 
which infers attitude from subjects* behavior rather than making direct 
measurements. 

Elementary School Students' Attitudes Toward Mathematics 

A number of attempts have been made to establish the relationship 
between attitude and student achievement in mathematics. Studies by Pof- 
fenberger and Nortoir (1959) and by Shapiro (1962) found low positive cor- 
relations between the two. The results of the extensive National Longitudinal 
Study of Mathematical Abilities (NLSMA) suggested a relatively stable pat- 
tern of positive correlations of mathematics attitude scores with both mathe- 
matics achievement scales and mathematics grades in each population of the 
study (Crosswhite, 1972). On the other hand, studies by Anttonen (1969), 
Cleveland ( 1 962 ) , and Faust ( 1 963 ) failed to support the belief that there is a 
positive correlation between attitude and achievement in mathematics. Some 
research has linked general intelligence with attitude toward mathematics 
(Crosswhite, 1972; Shapiro, 1962). 

Evidence suggests that attitudes toward mathematics may be formed as 
early as the third grade (Callahan, 1 971; Fedon, 1958; Stright, I960), al- 
though these attitudes tend to be more positive than negative in elementary 
school. Interestingly, there is evidence of a decline from third through sixth 
grades in the percentage of students who express negative attitudes toward 
mathematics (Crosswhite, 1972; Stright, I960). 

At the elementary school level attitude toward mathematics and 
achievement in mathematics are related to a number of personality variables, 
such as good ad justment, high sense of personal worth, greater sense of respon- 
sibility, high social standards, motivation, high academic achievement, and 
freedom from withdrawal tendencies (Naylor & Gauiry, 1973; Neufeld, 
1968; Swafford, i970). Children with positive attitudes tov^ard mathematics 
tend to like detailed work, to view themselves as more persevering and self- 
fonfidcni (Aiken, 1972), and to be more '*intuitive*' than **sensing** in person- 
ality ty^K.^ (May, 1972). When altitude scores are used as predictors of 
achievement in elementary school mathematics, a low but significant positive 
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nirrchuion is usually found (Evans, l<>72: Masiariiuono, 1071; Moorr, 
1972). 

Elementary Teachers and Attitudes Toward Mathematics 

Unforiunaicly, many prospective teachers seem to have unfavorable 
attitudes towards mathematics (Dutlon, 1962; Reys & Delon, 1968). How- 
ever, preservice mathematics content and methods courses for prospective ele- 
mentary teachers seem to improve attitudes toward mathematics (Gee, 1966; 
Reys & Delon. 1968; White, 1965; Wickes, 1968). 

Attitudes of elementary teachers toward mathematics are generally 
less positive than those of secondary school mathematics teachers (Wilson, 
Cahen, & Begle, 196b>:').On the other hand, Stright ( I960) concluded that a 
large percentage of elementary teachers enjoy teaching arithmetic and attempt 
to make the su[)jn:t interesting. Brown (1962) concluded that experienced 
teachers had more positive attitudes toward arithmetic and possessed a hcttcr 
understanding of the subject than did less experienced teachers. Todd (1966) 
found tfiai a siatc-wide inscrvice course produced significant changes in 
teacher attitudes toward arithmetic and in arithmetic understanding. 
Teacher Attitude as Related to Student Attitude and Achievement 

Educators generally believe thai teacher altitude and effectiveness in a 
particular subject arc salient determinants of student attitudes and perfor- 
mance i n the subject. Poffenberger (1959) concluded that teachers who tend to 
alTect students' attitudes and achievement positively have a good knowledge of 
and interest in the subject, a desire to have students understand, and good 
control of the class. 

The relationship between teacher attitude and studeni achievement in 
mathematics has been verified more often than has the connection ber.ween 
teacher attitude and student attitude. Torrance (1966) concluded that teacher 
clTectiveness had a positive effect on student attitudes toward teachers, meth- 
ods, and overall school climate. Phillips (1973) found that teacher attitude for 
2 of the past 3 years, especially most-recent teacher attitude, was significantly 
related to studeni attitude toward mathematics. On the other hand, studies by 
Caex/a (1970), Van dc Walle ( 1973), and Wess ( 1970) found no statisti- 
cally si^nilicanl relationships between teacher attitudes and either the atli- 
ludcs or aiiiludc changes of their students. 

Attitudes Toward Problem Solving 

.Several years ago Brownell (1942) observed thai favorable student 
attitudes toward problem solving are a desirable and obtainable educational 
outcome. More recently, Polya (1965) has stressed the importance of 
favorable teacher altitudes in helping students acquire problem-solving profi- 
ciency. In a publication l)y the Ontario In.stitute for Studies in Education 
( 1971 ) the following observation is made: 
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Granted that problem solving is both a desirable and an essential part of 
school mathematics, ii seems a necessary prerequisite for successful de- 
velopment of problem solving skills that both teacher and student have 
positive altitudes to problems, (p. 35) 

Aiken (1970) has called for more intensive investigations of attitudes 
toward mathematics and has suggested that an individual's attitude toward 
one aspect of the discipline, such as problem solving, may be entirely different 
from his or her attitude toward another aspect, such as computation. The fol- 
lowing is a review of the work of the few researchers who have investigated 
problem-solving attitudes. 

A problem-solving attitude scale for college students. Though Carey 
(1958) was interested in general problem solving, rather than mathematical 
problem solving, her study is important because it represents a first attempt to 
construct a problem-solving attitude scale. She constructed a reliable instru- 
ment with a Likert-type format to measure attitudes toward problem solving. 
The use of this scale enabled her to conclude that college-age men and women 
differ in attitudes toward problem solving and that problem-solving perfor- 
mance is positively related to problem-solving attitude. 

A Brazilian study of problem-solving attitudes. Lindgren, Silva, 
P'araco, and DaRocha (1964) studied attitudes toward problem solving as a 
function of success in arithmetic in Brazilian elementary schools using a 24- 
item adaptation of the Carey ( 1958) scale. An arithmetic achievement test, a 
general intelligence test, and a socioeconomic scale also were administered to 
fourth-grade students. Favorable problem-solving attitudes were positively 
and significantly correlated with arithmetic achievement, although the correla- 
tions were rather low. Problem-solving attitudes of the students showed near- 
zero correlations with intelligence test scores and socioeconomic status. Unfor- 
tunately, Lindgren et al. did not correlate problem-solving attitudes with per- 
formance in problem solving. 

A problem- solving inventory for children. Of particular interest to the 
present study is the work by Covington (1966) who devised instruments to 
assess problem-solving competency among upper elementary school children. 
This effort resulted in the development of the Childhood Attitude Inventory for 
Problem Solving (CAPS). This inventory is designed to indicate children's 
beliefs about the nature of the problem-solving process, attitudes toward cer- 
tain aspects of problem solving, and degree of self-confidence in dealing with 
problem-solving tasks. Though CAPS itself does not assess attitudes toward 
mathematical problem solving, it holds promise as a model for similar instru- 
ments related to mathematical problem solving. 

Concluding Remarks • 

The complex nature of both altitudes and mathematical problem-solv- 
ing makes the search for definitive answers about the natures of each variable 
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. and their relationships tedious and frustrating. Ai best, the evidence about 
two variables is ineonelusivc, and research into their relationships is aln 
nonexistent. 



Design and Conduct of the Study 

The study was planned to be conducted in two parts with samples from 
two different populations, as depicted in Fier.ure I. 



PART II 



PART I 



Time 1 



Measures 
A, B,& C 



Measures 
A,B,& C* 



"Treatment" 



Time 2 



Study of 
DMP Topics 



I 



Measures | DMP 
A, B, & C* I Sample 

I 



Non-DMP 
Sample 



•Measure A: Student Mathematical Prcblem-solving Test 
Measure B: Student Mathematical Problem-solving Attitude Scale 
Measure C: Teacher Mathematical Problem-solving Attitude Scale 



Figure 1 . The design of the study . 

The following is a description of the instruments and specific details of 
the design. 

The Instruments of the Study 

Three instruments were used in the present study: (a) a mathematical 
problem-solving test, (b) a student mathematical problem-solving attitude 
scale, and (r) a teacher mathematical problem-solving attitude scale. The 
mathematical problem-solving test was developed by Romberg and Wearne 
(Wearne, 1976) and is described in more detail in Chapter 8. 

Efforts to develop reliable scales to measure attitude toward problem 
solving have met with reasonable success. However, existing scales were either 
inappropriate or unavailable for use in this study. Therefore, two problem- 
solving attitude scales were developed by the author. 
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Constmctinn o/ the student attitwlt \ailt'. Nunnally (1967) has ob- 
served thai if verbalized aililude is lo be measured, the conienJ validity of ihc 
insirumeni is the major issue. Furthermore, he maintains that content validity 
is best ensured by a representative collection of items and sensible instrument 
construction. Therefore, a procedure similar to that used in developing the 
NLSMA attitude scales (see Romberg & Wilson, 1969) was followed hy the 
author in constructing the student mathematical problem-solving attitude 
scale. First, a pool of 82 Likert-type items was constructed. Each item was 
intended to measure some aspect of fourth-grade students' attitudes toward 
mathematical problem solving. Next, the list of items was submitted to a panel 
i)f reviewers for careful scrutiny. Any item rejected by a^ least two reviewers 
was discarded. This procedure yielded a 40-itcm pilot scale with Liken 
format. ' 

Pilot test and item cnnly.sis of the student attitude scale. The pilot ver- 
sion of the student mathematica' problem-solving attitude scale was adminis- 
tered by the author to 51 fourth-grade students. Item responses for each stu- 
dent were coded on a Gve-point scale, ranging from five for the most favorable 
response to one for the most unfavorable response. Total scale scores could 
vary from 200 for the most favorable attitude to 40 for the most unfavorable 
attitude. A score of 1 20 signified a neutral attitude. Mean total response score 
was 142.9. Cronbach*s alpha (Cronbach, 1951 ), a measure of interna) consis- 
tency reliability of the instrument, was .90 for th*: total scale. 

The revised student scale. Following the analysis of the pilot test re- 
suits, a revised, two-part student mathematical problem-solving scale was de- 
veloped. Part I had 12 items in a happy/sad faces response format and was 
designed to provide an informal measure of a student's attitude. An example of 
ihe hems in this pari is shown in Figure 2. Part II consisted of 24 items in 
modified Likert format providing a formal and more specific measure of atti- 
tude. An example of a formal item is shown in Figure 3. The revised scale, as a 
whole, provides a composite measure of a number of variables which influence 
a fourth-grade stvident*s attitude toward mathematical problem solving (see 
Whitaker. 

If we spent more time in school doing math problems. I would be 

© © © ©© 

Figure 2. Example of a mafhemancal problem-solving attitude item with '*happy/sad 
faces" format. 
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After I read a problem. I like to think about what I know and what I don't know m the 
problem. 



REALLY AGREE 

AGREE 

CAN'T DECIDE 
DISAGREE 

. , REALLY DISAGREE 

Figure 3. Example of a mathematical oroblem-solving attitude item with modified 
Likert format. 



Construction of the teacher attitude scale. A procedure nearly identical 
lo thai used for the student attitude scale was adopted for construction of the 
teacher scale. First, a pool of 70 Likert-type items was written, each item to 
measure some aspect of an elementary teacher*s attitude toward mathematical 
problem solving. Many statements were similar in content and wording to 
those written for the student scale. The pool was submitted to the same panel 
of reviewers who examined the student items. Any item rejected by at least two 
reviewers was again discarded. The same five-part response format — really 
agree, agree, can*t decide, disagree, and really disagree — was used on the 
teacher scale. 

Pilol test and item analysis of the teacher attitude scale. A 50-item pilot 
version of the teacher mathematical problem-solving attitude scale was admin- 
istered by the author to 28 elementary school teachers. A five-point coding 
scheme was adopted for each response so that the maximum possible score on 
the scale was 250, indicative of a very favorable attitude. A score of 150 indi- 
cated a neutral attitude, while a score of 50 meant a very unfavorable attitude. 
Mean total score for the pilot sample was 181.5. Internal consistency 
(Cronbach, 1951) of the teacher scale was .96. 

The revised teacher attitude scale. After revisions, the teacher mathe- 
matical problem-solving attitude scale used in the study consisted of 40 items 
in Likert format. Thirty-one pilot scale items were used and nine other items 
were added refieciing the teaching of problem-solving skills and processes. The 
total scale provides a composite measure of an elementary teacher*s attitude 
toward mathematical problem solving (see Whitaker, 1976). 
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Part I of the Study 

The lirsi part of the siiidy dealt with tiucslions 1-5 rormulnlcd earlier 
in this chapter. The sample and procedures for this part are outlined h-l /w. 

I'fw sample. Subjects in the sample for Part I of the study were 30 
fourth-grade teachers and their fourth-grade mathematics classes. Fifteen of 
the teachers and students were participants in the large-scale Geld te.st of the 
Developing Mathematical Processes (DMP) (Romberg et al., 1975, 
1976) program. They were using the commercial fourth-grade DMP materi- 
als during the 1 975-76 school year. The 1 5 DMP classes were in six Wiscon- 
sin schools. The remaining 15 fourth-grade classes were in seven other Wis- 
consin schools. These teachers and students were using mathematics programs 
other than DMP. 

The procedures. During the second week of December 1975, the three 
instruments of the study were administered by the author and a testing spe- 
cialist to the DMP students and teachers. Testing was carried out in the class- 
rooms of the participating schools on two different days; the mathematical 
problem-solving test was given on the first day and the attitude scales on the 
.second day. The non-DMP testing was begun during the second week of Jan- 
uary 1976 and completed early in the fourth week of that month. Procedures 
similar to those used with Uie DMP sample were followed with the non-DMP 
sample. 

Part II of the Study 

The second part of the study dealt with questions 6 and 7 posed earlier 
in this chapter. The paragraphs below describe the sample and procedures for 
this pi' J" : study. 

lihf sample. Subjects were the same fourth-grade teachers and their 
mathematics students in the DMP sample of Part I. Unfortunately, because of 
a leacher resignation, the second part of the study was conducted with only 14 
classes. The non-DMP teachers and students did not participate in Part II. 

I'he procedures. The study involved two different testing periods 
(Time 1 and Time 2) with an intervening '^treament" period. The first testing 
period was described above. The second testing period began during the sec- 
ond week of March 1976 and ended during the last week of that month. Test- 
ing at Time 2 was conducted in the classrooms of the participating schools, 
again on two different school days. Tests were administered by the author and 
the testing specialist who assisted at Time 1. The mathematical problem-solv- 
ing test was administered on the first day. This test was an alternate version of 
that used at Time 1 . except that each item on the second version had a multi- 
ple-choice response format. The mathematical problem-solving attitude scales 
(with rerandoniized items) were given the day after the problem-solving test. 
The intervening ''treatment" period between Time 1 and Time 2 lasted ap- 
proximately 12 weeks, although the duration could not be controlled precisely 
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because of scheduling difficulties. The "ircaimeni" consisted of insiruciion in 
the regular sequence of DMP topics for fourth grade, with the restriction that 
teachers select at least one topic from the problem-solving strana of DMP. 
Without exception, teachers elected to cover DMP Topic 57, I'he Numbers 0- 
999,999. 

Findings for Part I 

P'ive main questions were the foci around which the 6rst part of the 
study was conducted and the data were analyzed. Below is a discussion of the 
data and findings for each question in turn. 

Findings for Question 1 

The first question of the study was: Do fourth-grade students have 
favorable attitudes toward problem solving? To answer this question the 36- 
iteni mathematical problem-solving attitude scale was administered to stu- 
dents in the sample. Item responses for each student were coded oh a five-point 
scale ranging from five for the most favorable response to one for the most 
unfavorable response, A total scale score of 180 represents the most favorable 
attitude toward problem solving, a score of 108 signifies a neutral attitude, and 
a score of 36 represents the most unfavorable attitude. The 1 2 items in Part I 
of tlie scale measure students' reactions to general types of mathematics 
problems, with possible scores ranging from 60 for most fa\orable to 12 for 
most unfavorable; a score of 36 represents a neutral attitude. The 24 items in 
Part II of the scale assess students' reactions to specific problem situations and 
problem-solving techniques, and scores can range from 120 for most favorable 
to 24 for most unfavorable, uith a score of 72 indicating a neutral attitude. 

Table 1 gives a summary of the mathematics problem-solving attitude 
scores for the 619 students who responded to the scale. Scores ranged from 
unfavorable to very favorabk: on each of the two parts of the scale and on the 
total scale, but each mv^w score was closer to indicating a favorable attitude 
than a neutral attitude. Thus, the fourth-grade students in the sample seemed 
to possess favorable attitudes toward mathematical problem solving. When the 
data were analyzed by sex and by program type ( DMP versus non-DMP), no 
significant differences in results were observed. 



Table 1 

Mathematical Problem-solving Attitude Scores 
of Students in Sample Population (N= 619) 



Scale part 


Minimum 


Maximum 


Mean 


SD 


1 (Informal) 


12.0 


60.0 


43.7 


8.4 


II (Formal) 


38.0 


116.0 


85.9 


12.9 


Total (Composite) 


52.0 


176.0 


129.6 


18.9 



Cronbach's alpha (Croubach, 1951) was compuied for each pan of 
ihe scale. For^Pari I, I he reliability coefficient was .85; for Pan II it was .82; 
the total scale reliability coefficient was .88. These reliability estimates were 
judged to be quite satisfactory. 

Findings for Question 2 

The second question of the study was: Do fourth-grade teachers have 
favorable attitudes toward problem solving? To answer this question a teacher 
mathematical problem-solving attitude scale was administered to the 30 teach- 
ers in the sample. Thiny-one of the 40 items on the scale assess teachers* reac- 
tions to types of mathematics problems and problem situations, and frustration 
or anxiety experienced when solving problems. The remaining items assess 
teachers' feelings about teaching various problem-solving skills and processes. 
The total scale provides a composite measure of an elementary school teacher's 
altitude toward mathematical problem solving. Item responses are coded on a 
fiNC-point scale, ranging from a score of five for the most favorable response to 
one for [he most imfavorable response. A total scale score of 200 represents the 
m<ist favorable attitude toward problem solving, a score of 120 signifies a neu- 
tral attitude, and a score of 40 indicates the most unfavorable attit^ide. 

The attitude scores of the teachers in the sample ranged from slightly 
favorable to very favorable, evidenced by a minimum recorded score of 134 and 
maximum recorded score of 175. Mean score for the sample was 156.5 (stan- 
dard deviation of 9.6), indicating that the teachers possessed favorable atti- 
tudes toward mathematical problem solving. When teacher attitudinal data 
were analyzed by type of mathematics program taught (DM? versus non- 
DMP), difference in mean attitude scores was not significant. 

The internal consistency of the teacher attitude scale was .80. Though 
the reliability estimate was lower than aiiticipaied, it was judged satisfactory 
given the relatively small sample size. 

Findings for Question 3 

The third question investigated was: How do fourth graders perform 
on a test of problem-solving performance which provides measures of compre- 
hension, application, and problem solving? Three separate scores were re- 
ported for each student responding to the mathematical problem-solving test 
(Wearne, 3976). .Students were able to solve correctly more of the comprehen- 
sion items than application items and more of the application items than prob- 
lem-solving items; this result was expected since it reflects the order of diffi- 
culty of the items. 'l*he problem-solving items are the most difficult and 
are prtthlems in the sense of the definiiioji given in Chapter 1. As shown in 
Table 2, of a total of 22 three-part items on the test, mean number of problems 
solved correctly by the students was 1 5.00, 9.50, and 3.19 for comprehension, 
application, and problem solving, respectively. Most of the students, then, 
could not be classified as good at solving problems of the type specified by the 
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definition. A more deiailcd discussion of student performance on the prohlcm- 
solving test may he found in Chapter 8. 

When the data was grouped by sex, differences in prohlem-soi ving per- 
formance were not significant. However, when scores were analyzed by pro- 
gram type, DM P students performed significantly better (/? <: .01 ) than the 
non-DMP students on the comprehension and application parts of the prob- 
lem-solving test. Difference in performance for the pn^blem-solviiig part of \hv 
icst was not significant. 



Table 2 

Mathematical Problem-solving Performance Scores 
of Students (iV=611) 





Number of 


Minimum/ 




SO 


IteiTis 


items 


Maximum 


Mean 


Comprehension 


22 


2/22 


15.00 


3.5 


Application 


22 


1/20 


9.50 


3.9 


Problem Solving 


22 


0/15 


3.19 


2.5 



Findings for Question 4 

The fourth question was: What is the relationship between fourth- 
grade students' attitudes toward problem solving and their performance in 
problem solving? The correlation matrix calculated to determine this relation- 
ship is presented in Table 3. Correlations between the three student attitude 
scores and the three problem-solving scores are shown. Significant positive cor- 
relations (f) < .01 ) existed between each of the attitude scores and each of the 
problcm-s-olving scores. Aside from the strong intercorrelations between the 
various part.s of each instrument, the strongest correlations were found be- 
tween sluflcnis' Pan 11 attitude scores and their comprehension, application, 
and problem-solving scores. When the data were grouped by sex, there was a 
signiGcani positive relationship (/; < .05) between attitude and performance 
for both hoys and girls. 
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Table 3 

Correlation Matrix for Students' Mathematical Problem-solving 
Attitudes and Mathematical Problem-solving Performance 

(/V=579) 









Total 


Compre- 




Problem 




Attitude 1 


Attitude II 


attitude 


hension 


Application 


solving 


Attitude 1 


1.00 












/ttitude II 


.55* 


1.00 










Total attitude 


.82* 


.93* 


1.00 








Comprehension 


.12* 


.24* 


.21* 


1.00 






Application 


.15* 


.31* 


.27* 


.69* 


1.00 




Problem solving 


.15* 


.25* 


.23* 


.49* 


.69* 


1.00 



•Significant at p -«c.01 as determined by Fisher Z-transformation (see Hays, 

1973) . 



Corrclaiions calculated for the siudeni data categorized by program- 
type indicated a positive relationship between problem-solving attitude and 
performance for both groups. Correlations for the DMP sample ranged from 
,03 10 .17. with six of the nine correlations between attitude and performance 
significant at the .05 levcL For the non-DMP sample, the correlations were 
somewhat higher, ranging from .18 to .43, with all correlations significant at 
the .05 ifvcl. Thus, there appeared to be a stronger relationship between stu- 
dent problem-solving attitude and problem-solving performance for the non- 
DMP sample than for the DMP sample. Exploratory analyses with data from 
the DMP sample suggested that students with hirh pioijlrrn-solving perfor- 
mance have problem-solving attitudes consid(^rabiy higher than average, while 
those students with low performance have '/ower than average uiiiutdes. 

Finduigs for Question 5 

The fifth question investigated wais: W ;i;u Ls the rclaiM>nship between 
fourth -grade teachers' attitudes toward ;problem solving and their students' 
performance in problem solving? Correlaiuons belwt:on tcachrrs' attitudes and 
the mean problem-solving performance of. the studcii.'s in their classes were 
found to be (onsisienily very weak, negative., nunsignificani, and in the range 
of -.05 to -.08. Thus, for the 30 classes in the sample, there appeared to be little 
observable relationship between teacher problem-solving attitude aa<^i student 
problem-solving performance. No significant differer;<^<^s were foUVid when the 
data were analyzed by sex of the students. 

Surprising and almost unbelievable results were found when correla- 
iif»ns were computed on the basis of program-type. For the non-DMP sample 
ihc correlations between teacher attitude anri mean student problem-solving 
prrf ormance ra:>grd from . 1 6 to . 1 9 and were iionsignific:ini. However, for the 
DNIP sampile, substantial negative correlations were found; they ranged from 
-.47 to -.59 rind twf> of the three were sir.iificant at the .05 level. In an attempt 
to explain negaiivc (orrelaiions of thi.«i [)rop<)riion. several exploratory analy- 
ses were undertaken. .Scatter plots were drawn to show the relationship be- 
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twecn teacher atiitude scores and mean student scores on each of the three 
parts of the problem-solving test. The scatter plots and accompanying regres- 
sion 1if>es didy indeed, verify the negative nature of these relationships. Since 
these correlations were based on a relatively small and nonrandom sample, 
and since the attitudes of all teachers were favorable and the variance in scores 
was slight, the negative relationships were judged an artifact of this particular 
population. 

Findings for Part II 

The second part of the study was directed at questions 6 and 7 posed 
earlier in this chapter. The basic plan for Part II involved problem-solving 
testing at two different times with an intervening "treatment ^ period. Only the 
DMP sample of teachers and students was involved in this part of the study. 

Findings for Question 6 

The sixth question of the study was: Do fourth-grade teachers' atti- 
tudes toward problem-solving affect their students' proble.;.-solving perfor- 
mance, or is the effect of the opposite nature? The cross-lagged panel correla- 
tional technique recommended by Campbell and Stanley ( 1963) was used for 
this part of the study, since simple correlational procedures cannot answer 
questions of cause and effect. As shown in Table 4 the correlations between 
student problem-solving performance at Time 1 and teacher problem-solving 
attitude at Time 2 were significantly more positive than the correlations be- 
tween teacher attitude at Time I and student performance at Time 2. Thus, 
initial mean student problem-solving performance seemed to have a greater 
effect on final teacher attitudes than initial teacher attitudes had on final mean 
student problem-solving performance. 

Cross-lagged panel correlations were also calculated for the data 
grouped by sex of students. The same directional relationships as in the total 



Table 4 

Cross-lagged Correlations: 
Time 1 Teacher Attitude with Time 2 Student 
Performance (/'12) and Time 2 Teacher Attitude with Time 1 
Student Performance (/'21) 



Cross-lagged 








correlations 


Comprehension 


Application 


Problem solving 


^12 


-.72 


-.72 


-.69 


^21 


-.25* 


-.50* 


-.53* 



•Significant at p <: .01 as determined by Fisher Z-transformation (see Hays, 

1973) . 
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sample were noted for girls, but for boys, the relationship was apparent only 
lor the comprrhcnsion and application parts of the problem-solving test. 

Findings for Question 7 

The 6nal question of the study was the following: Do fourth-grade 
teachers' attitudes toward problem solving affect their students' attitudes 
toward probleni i;o\v'n5 or ts the effect of the opposite nature? The cross- 
lagged panel correlational technique was also employed to answer this ques- 
tion. Results are shown in Table 5. Correlations between teacher problem- 
solving atiitude at Time 1 and student problem-solving attitude at Time 2 
were significantly more positive than the correlations between teacher attitude 
at Time 2 and student attiiudi. at Time 1. Thus, initialteacher attitude seemed 
to have a greater effect on final student attitude than initial student attitude 
had on final teacher attitude. 

When cross-lagged correlations were calculated on the data grouped 
by sex of student, the same directional relationships held between teacher atti- 
tude and student attitude for boys and girls separately as held for the total 
sample. 

Table 5 

Cross-lagged Correlations: 
Time 1 Teacher Attitude with Time 2 Student Attitude (/'12) 
and Time 2 Teacher Attitude with Time 1 Student Attitude (r2i) 



Cross-lagged 








correlations 


Attitude 1 


Attitude 11 


Total 




.29 


-.03 


.13 


121 


-.47" 


-.30* 


-.37* 



'Significant at p -< .01 as determined by Fisher ^transformation (see Hays. 

1973) . 



Implications, Limitations, and Recommendations 

Information -orienied research, such as the present study, provides in- 
sight into specific relationships between curriculum variables and suggests di- 
rections for additional studies. This section of the chapter, then, presents the 
implications and limitations of the study along with recommendations for fu- 
ture research. 

Student Problem-solving Attitudes 

If students in this study are reflective of those in a larger population, 
then most fourth-grade students do, itideed, possess favorable attitudes toward 
prdhlem solving. Though not a random sample, the relatively large number of 
participating students strengthens the generaiizability of the findings. 
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The problem-solving aiiiiudc scale developed for the study needs fur- 
ther validation with other elementary school populations. An interesting fol- 
low-up to the present study would be an observational investigation to deter- 
mine if students actually possess the kinds of problem-solving behaviors which 
their responses to the problem-solving attitude scale indicate. 

Teacher Problem-solving Attitaides 

All teachers in the sample for the study indicated favorable attitudes 
toward problem solving, but, because there were only 30 of them, this finding 
may not be indicative of the larger population. Therefore, the teacher prob- 
lem-solving attitude scale needs more extensive validation with other popula- 
tions. The scale also could be used with prospective elementary school teachers 
to determine their attitudes towards mathematical problem solving. 

Student Problem-solving Performance 

The findings of the study indicate that fourth-grade students perform 
reasonably well on the first two parts of a test of mathematical problem solving 
which f)rovides measures of comprehension, application, and problem solving. 
However, masi students did not perform well on the third part of the test, a 
measure of problem-solving performance. The test by Romberg and Wearne 
(Wearne, 1976) holds promise as a viable tool for providing information to 
teachers and other school personnel about the problem-solving capabilities of 
students. This test can help diagnose student difficulties in comprehension, 
applic ation, and problem solving. Once problem areas are diagnosed, teachers 
can plan remedies. 

The fact that there were no significant differences between the prob- 
Iem-.solving performance of boys and girls in this study indicates that teachers 
need not vary teaching techniques for the sexes. However, the fact that DM? 
students performed significantly better than non-DMP students on the com- 
prehension and application ponions of the test suggests that factors within the 
DMP program produce this effect. It would be interesting to determine 
whether similar diO'erences exist in other populations of DMP and non-DMP 
st^idents. 

Student Problem-solving Attitudes and Performance 

The significant and stable positive relationships found between student 
problem-solving attitude and student problem-solving performance suggest 
thai the relationships between attitude and performance are the same for 
problem solving as for mathematics in general. Because of these positive rela- 
tionships, it seems wise to foster favorable student reactions and sentiments 
toward all aspects of mathematical problem solving. 

Teacher Problem-solving Attitude and Student Problem-solving 
Performance 

The somewhat inconsistent findings in the relationships between 
teacher problem-solving attitude and student problem-solving pcrforn?ance. 
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when coupled with the relatively small sample of classes upon which the Bnd- 
ings were based, suggest the need for gathering similar data from other similar 
populations. This suggestion also is based upon the surprising negative corre- 
lations that appeared in the DMP sample of the study. Clearly, more evidence 
is needed before deBnitive judgments can be made. 

Cause axKl Effect Relationships Between Teacher Attitude and 
Student Attitude and Performance 

Though calls for replication of research studies are easily made, the 
findings of the second part of the study obviously demand such efforts. If the 
directional relationship is one way for teacher attitude and student perfor- 
mance, and the opposite direction for teacher attitude and student attitude, 
teachers should be aware of this situation. If this directional influence is pecu- 
liar to a particular population, then knowledge of that fact would be beneficial. 

The cross-lagged panel correlational technique (Campbell & Stanley, 
1963) holds promise as a valuable research design for inferring the cause and 
effect relationships between such variables as attitude and performance. As a 
follow-up to the present investigation, the author suggests that an improved 
use for the cross-lagged technique might involve initial problem-solving test- 
ing with students and teachers near the start of the school year and again at 
mid-year; this plan would reduce the confounding teacher-pupil effect occur- 
ring when initial testing is done several weeks into the school year. 

Concluding Remarks 

The study reported in this chapter investigated selected noncognitive 
factors and the mathematical problem-solving performance of fourth-grade 
children. As is often the case, the results have raised more questions than they 
have answered. In the author's opinion, the most important findings of the 
study are: (a) fourth-grade students and teachers seem to possess favorable 
attitudes toward mathematical problem solving; (b) fourth-grade students 
perform satisfactorily on comprehension and application items, but not on the 
problem-solving items of a three-part mathematical problem-solving test; and 
(c) there seecns to be a significant and stable positive relationship between 
student mathematical problem-solving performance and student problem- 
solving altitude. The other findings of the study are important, but must be 
viewed as tentative until validated with additional research. 
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